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Description 

BACKGROUND OF THE INVENTION 

[0001] The present Invention relates to genetic engineering and more particularly to plant transformation in which a 
plant is transformed to express a heterologous gene. 

[0002] Although great progress has been made In recent years with respect to transgenic plants which express 
foreign proteins such as herbicide resistant enzymes and viral coat proteins, very little Is known about the major factors 
affecting expression of foreign genes In plants. Several potential factors could be responsible In varying degrees for 
the level of protein expression from a particular coding sequence. The level of a particular mRNA In the cell is certainly 
a critical factor. 

[0003] The potential causes of low steady state levels of mRNA due to the nature of the coding sequence are many. 
First, full length RNA synthesis might not occur at a high frequency. This could, for example, be caused by the premature 
termination of RNA during transcription or due to unexpected mRNA processing during transcription. Second, full length 
RNA could be produced but then processed (splicing, poIyA addition) In the nucleus in a fashion that creates a non- 
functional mRNA. If the RNA is properly synthesized, terminated and polyadenylated, It then can move to the cytoplasm 
for translation. In the cytoplasm, mRNAs have distinct half lives that are determined by their sequences and by the cell 
type in which they are expressed. Some RNAs are very short-lived and some are much more long-lived. In addtfon, 
there Is an effect, whose magnitude Is uncertain, of translational efficiency on mRNA half-life. In addition, every RNA 
molecule folds into a particular structure, or perhaps family of sturctures, which Is determined by its sequence. The 
particular structure of any RNA might lead to greater or lesser stability In the cytoplasm. Structure per se Is probably 
also a determinant of mRNA processing in the nucleus. Unfortunately, it Is impossible to predict, and nearly impossible 
to determine, the structure of any RNA (except for tRNA) In vitro or in vivo. However, It Is likely that dramatically changing 
the sequence of an RNA will have a large effect on its folded structure. It is likely that structure per se or particular 
structural features also have a role In determining RNA stability. 

[0004] Some particular sequences and signals have been identified in RNAs that have the potential for having a 
specific effect on RNA stability. This section summarizes what is known about these sequences and signals. These 
identified sequences often are A+T rich, and thus are more likely to occur in an A+T rich coding sequence such as a 
B.t. gene. The sequence motif ATTTA (or AUUUA as It appears In RNA) has been Implicated as a destabilizing sequence 
In mammalian cell mRNA (Shaw and Kamen, 1 986). No analysis of the function of this sequence In plants has been 
done. Many short lived mRNAs have A+T rich 3' untranslated regions, and these regions often have the ATTTA se- 
quence, sometimes present In mutlple copies or as multlmers (e.g., ATTTATTTA...). Shaw and Kamen showed that 
the transfer of the 3' end of an unstable mRNA to a stable RNA (globin or VA1) decreased the stable RNA's half life 
dramatically. They further showed that a pentamer of ATTTA had a profound destabilizing effect on a stable message, 
and that this signal could exert its effect whether It was located at the 3' end or within the coding sequence. However, 
the number of ATTTA sequences and/or the sequence context in which they occur also appear to be Important In 
determining whether they function as destabilizing sequences. Shaw and Kamen showed that a trimer of ATTTA had 
much less effect than a pentamer on mRNA stability and a dlmer or a monomer had no effect on stability (Shaw and 
Kamen, 1987). Note that multlmers of ATTTA such as a pentamer automatically create an A+T rich region. This was 
shown to be a cytoplasmic effect, not nuclear, in other unstable mRNAs, the ATTTA sequence may be present In only 
a single copy, but it is often contained In an A+T rich region. From the animal cell data collected to date, It appears 
that ATTTA at least In some contexts is important in stability, but it is not yet possible to predict which occurences of 
ATTTA are destablling elements or whether any of these effects are llkefy to be seen In plants. 
[0005] Some studies on mRNA degradation In animal cells also indicate that RNA degradation may begin In some 
cases with nucleofytlc attack In A+T rich regions. It Is not clear if these cleavages occur at ATTTA sequences. There 
are also examples of mRNAs that have differential stability depending on the cell type in which they are expressed or 
on the stage within the cell cycle at which they are expressed. For example, histone mRNAs are stable during DNA 
synthesis but unstable If DNA synthesis Is disrupted. The 3' end of some histone mRNAs seems to be responsible for 
this effect (Pandey and Marzluff, 1987). It does not appear to be mediated by ATTTA, nor Is it clear what controls the 
differential stability of this mRNA. Another example Is the differential stability of IgG mRNA In B lymphocytes during B 
cell maturation (Genovese and Milcarek, 1988). A final example is the instability of a mutant beta-thallesemlc globin 
mRNA. In bone marrow cells, where this gene is normally expressed, the mutant mRNA is unstable, while the wild- 
type mRNA Is stable. When the mutant gene Is expressed in HeLa or L cells In vitro, the mutant mRNA shows no 
Instability (Llm et al. t 1988). These examples all provide evidence that mRNA stability can be mediated by cell type or 
cell cycle specific factors. Furthermore this type of Instability is not yet associated with specific sequences. Given these 
uncertainties, It is not possible to predict which RNAs are likely to be unstable In a given cell. In addition, even the 
ATTTA motif may act differentially depending on the nature of the cell in which the RNA is present. Shaw and Kamen 
(1 987) have reported that activation of protein kinase C can block degradation mediated by ATTTA. 
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[0006] The addition of a polyadenylate string to the 3* end Is common to most eucaryotlc mRNAs, both plant and 
animal. The currently accepted view of polyA addition Is that the nascent transcript extends beyond the mature 3* 
terminus. Contained within this transcript are signals for polyadenylatlon and proper 3* end formation. This processing 
at the 3' end Involves cleavage of the mRNA and addition of polyA to the mature 3 1 end. By searching for consensus 

5 sequences near the polyA tract In both plant and animal mRNAs, It has been possible to Identify consensus sequences 
that apparently are Involved in polyA addition and 3' end cleavage. The same consensus sequences seem to be Inv 
portant to both of these processes. These signals are typically a variation on the sequence AATAAA. In animal cells, 
some variants of this sequence that are functional have been Identified; in plant cells there seems to be an extended 
range of functional sequences (Wlckens and Stephenson, 1984; Dean etal., 1986). Because all of these consensus 

10 sequences are variations on AATAAA, they all are A+T rich sequences. This sequence is typically found 15 to 20 bp 
before the polyA tract in a mature mRNA. Experiments In animal cells indicate that this sequence Is Involved In both 
polyA addition and 3' maturation. Site directed mutations in this sequence can disrupt these functions (Conway and 
Wlckens, 1 988; Wlckens et al., 1987). However, It has also been observed that sequences up to 50 to 100 bp 3* to the 
putative polyA signal are also required; I.e., a gene that has a normal AATAAA but has been replaced or disrupted 

15 downstream does not get properly polyadenytated (Gil and Proudfoot, 1 984; Sadofsky and Alwlne, 1 984; McDevItt et 
al., 1984). That Is, the polyA signal Itself Is not sufficient for complete and proper processing, it Is not yet known what 
specific downstream sequences are required In addition to the polyA signal, or If there Is a specific sequence that has 
this function. Therefore, sequence analysis can only Identify potential polyA signals. . . 

[0007] In naturally occuring mRNAs that are normally polyadenylated, It has been observed that disruption of this 
so process, either by altering the polyA signal or other sequences In the mRNA, profound effects can be obtained In the 
level of functional mRNA This has been observed In several naturally occuring mRNAs, with results that are gene 
specific so far. There are no general rules that can be derived yet from the study of mutants of these natural genes, 
and no rules that can be applied to heterologous genes. Below are four examples: 

1. In a globln gene, absence of a proper polyA site leads to Improper termination of transcription, ft Is likely, but 
not proven, that the Improperly terminated RNA Is nonfunctional and unstable (Proudfoot et al., 1987). 

2. In a globln gene, absence of a functional polyA signal can lead to a 100-fold decrease In the level of mRNA 
accumulation (Proudfoot et al., 1 987). 

3. A globin gene polyA site was placed into the 3' ends of two different hlstone genes. The hlstone genes contain 
a secondary structure (stem-loop) near their 3' ends. The amount of properly polyadenylated hlstone mRNA pro- 
duced from these chimeras decreased as the distance between the stem-loop and the polyA site Increased. Also, 
the two hlstone genes produced greatly different levels of properly polyadenylated mRNA. This suggests an Inter- 
action between the polyA site and other sequences on the mRNA that can modulate mRNA accumulation (Pandy 
and Marzluff, 1987). 

4. The soybean leghemoglobin gene has been cloned into HeLa cells, and it has been determined that this plant, 
gene contains a "cryptic" polyadenylatlon signal that Is active in animal cells, but Is not utilized In plant cells. This 
leads to the production of a new polyadenylated mRNA that Is nonfunctional. This again shows that analysis of a 
gene In one cell type cannot predict Its behavior In alternative cell types (Wlebauer et al., 1988).. 

40 [0008] From these examples, it is clear that In natural mRNAs proper polyadenylatlon Is Important In mRNA accu- 
mulation, and that disruption of this process can effect mRNA levels significantly. However, Insufficient knowledge 
exists to predict the effect of changes In a normal gene. In a heterologous gene, where we do not know if the putative 
polyA sites (consensus sequences) are functional, It Is even harder to predict the consequences. However, It Is possible 
that the putative sites identified are dlsfu notional. That Is, these sites may not act as proper polyA sites, but Instead 

« function as aberrant sites that give rise to unstable mRNAs. 

[0009] In animal cell systems, AATAAA Is by far the most common signal identified In mRNAs upstream of the polyA, 
but at least four variants have also been found (Wlckens and Stephenson, 1 984). In plants, not nearly so much analysis 
has been done, but It Is clear that multiple sequences similar to AATAAA can be used. The plant sites below called 
major or minor refer only to the study of Dean et al. (1 986) which analyzed only three types of plant gene. The desig- 

50 nation of polyadenylatlon sites as major or minor refers only to the frequency of their occurrence as functional sites In 
naturally occurring genes that have been analyzed. In the case of plants this Is a very limited database. It is hard to 
predict with any certainty that a site designated major or minor is more or less likely to function partially or completely 
when found In a heterologous gene such as at 
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[0010] Another type of RNA processing that occurs in the nucleus Is intron splicing. Nearly all of the work on Intron 
processing has been done In animal cells, but some data Is emerging from plants. Intron processing depends on proper 
5* and 3' splice junction sequences. Consensus sequences for these junctions have been derived for both animal and 

35 plant mRNAs, but only a few nucleotides are known to be invariant Therefore, it is hard to predict with any certainty 
whether a putative splice junction is functional or partially functional based solely on sequence analysis. In particular, 
the only Invariant nucleotides are GT at the 5 1 end of the Intron and AG at the 3' end of the intron. In plants, at every 
nearby position, either within the intron or in the exon flanking the Intron, ali four nucleotides can be found, although 
some positions show some nucleotide preference (Brown, 1986; Hantey and Schuler, 1988). 

40 [0011] A plant Intron has been moved from a patatin gene into a GUS gene. To do this, site directed mutagenesis 
was performed to introduce new restriction sites, and this mutagenesis changed several nucleotides in the intron and 
exon sequences flanking the GT and AG. This Intron still functioned properly, indicating the Importance of the GT and 
AG and the flexibility at other nucleotide positons. There are of course many occurences of GT and AG In all genes 
that do not function as intron splice Junctions, so there must be some other sequence or structrual features that Identify 

45 splice junctions. In plants, one such feature appears to be base composition per se. Wlebauer et al. (1 988) and Goodall 
et al. (1988) have analyzed plant introns and exons and found that exons have —50% A+T while introns have ~70% 
A+T. Goodall et al. (1 988) also created an artificial plant Intron that has consensus 5' and 3' splice Junctions and a 
random A+T rich Internal sequence. This Intron was spliced correctly In plants. When the Internal segment was replaced 
by a G+C rich sequence, splicing efficiency was drastically reduced. These two examples demonsatrate that intron 

so recognition In plants may depend on very general features - splice junctions that have a great deal of sequence : 
diversity and A+T richness of the intron itself. This, of course, makes it difficult to predict from sequence alone whether 
any particular sequence is likely to function as an active or partially active Intron for RNA processing. 
[0012] . B.t genes being A+T rich contain numerous stretches of various lengths that have 70% or greater A+T. The 
number of such stretches Identified by sequence analysis depends on the length of sequence scanned. 

55 [0013] As for polyadenylatlon described above, there are complications In predicting what sequences might be uti- 
lized as splice sites In any given gene. First, many naturally occurlng genes have alternative splicing pathways that 
create alternative combinations of exons in the final mRNA (Gallega and Nadal-Ginard, 1 988; Helfman and Ricci, 1988; 
Tsurushita and Korn, 1989). That is, some splice junctions are apparently • recognized under some circumstances or 
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In certain cell types, but not in others. The rules governing this are not understood. In addition, there can be an interaction 
between processing paths such that utilization of a particular polyadenylatlon site can Interfere with splicing at a nearby 
splice site and vice versa (Adaml and Nevlns, 1988; Brady and Wold, 1988; Marzluff and Pandey, 1988). Again no 
predictive rules are available. Also, sequence changes in a gene can drastically alter the utilization of particular splice 
junctions. For example, In a bovine growth hormone gene, small deletions in an exon a few hundred bases downstream ' 
of an intron cause the splicing efficiency of the intron to drop from greater than 95% to less than 2% (essentially 
nonfunctional). Other deletions however have essentially no effect (Hampson and Rottman, 1988). Finally, a variety 
of In vitro and in vivo experiments Indicate that mutations that disrupt normal splicing lead to rapid degradation of the 
RNA In the nucleus. Splicing Is a multistep process In the nucleus and mutations In normal splicing can lead to blockades 
In the process at a variety of steps. Any of these blockades can then lead to an abnormal and unstable RNA. Studies 
of mutants of normally processed (potyadenylation and splicing) genes are relevant to the study of heterologous genes 
such as B.t B.t. genes might contain functional signals that lead to the production of aberrant nonfunctional mRNAs, 
and these mRNAs are likely to be unstable. But the B.t. genes are perhaps even more likely to contain signals that are 
analogous to mutant signals in a natural gene. As shown above these mutant signals are very likely to cause defects 
in the processing pathways whose consequence Is to produce unstable mRNAs. 

[0014] It is not known with any certainty what signals RNA transcription termination In plant or animal cells. Some . 
studies on animal genes that Indicate that stretches of sequence rich in T cause termination by calf thymus RNA 
polymerase II in vitro. These studies have shown that the 3' ends of in vitro terminated transcripts often lie within runs 
of T such as T5, T6 or T7. Other Identified sites have not been composed solely of T, but have had one or more other 
nucleotides as well. Termination has been found to occur within the sequences TAI 1 1 1 1 1, ATTCTC, TTCTT (Dedrtek 
et al., 1987; Relnes et al., 1 987). In the case of these latter two, the context In which the sequence is found has been 
C+T rich as well. It is not known If this is essential. Other studies have Implicated stretches of A as potential transcrip- 
tional terminators. An interesting example from SV40 illustrates the uncertainty in defining terminators based on se- 
quence alone. One potential terminator in SV40 was Identified as being A rich and having a region of dyad symmetry 
(potential stem-loop) 5' to the A rich stretch. However, a second terminator identified experimentally downstream In 
the same gene was not A rich and included no potential secondary structure (Kessler et al., 1988). Of course, due to 
the A+T content of B.t genes, they are rich In runs of A or T that could act as terminators. The Importance of termination 
to stability of the mRNA Is shown by the globin gene example described above. Absence of a normal polyA site leads 
to a failure In proper termination with a consequent decrease in mRN A. 

[0015] There Is also an effect on mRNA stability due the translation of the mRNA. Premature translation^ termination 
in human triose phosphate isomerase leads to instability of the mRNA (Daar et al., 1988). Another example is the beta- 
thallesemic globin mRNA described above that is specifically unstable in bone marrow cells (Llm et al., 1988). The 
defect in this mutant gene is a single base pair deletion at codon 44 that leads to translation^ termination (a nonsense 
codon) at codon 60. Compared to properly translated normal globin mRNA, this mutant RNA is very unstable. These 
results Indicate that an improperly translated mRNA is unstable. Other work in yeast indicates that proper but poor 
translation can have an effect on mRNA levels. A heterologous gene was modified to convert certain codons to more 
yeast preferred codons. An overall 10-fold Increase In protein production was achieved, but there was also about a 
3-fold Increase in mRNA Hoekema et al., 1 987). This Indicates that more efficient translation can lead to greater mRNA 
stability, and that the effect of codon usage can be at the RNA level as well as the translational level. It Is not clear 
from codon usage studies which codons lead to poor translation, or how this is coupled to mRNA stability. 
[001 6] EP-A-0 359 472 discloses modifying B.t sequences to render them more plant-like. The sequence Is modified 
so that the codon usage in the sequence is approximately the same as the codon usage In a plant. In contrast, the 
claimed Invention Is related to a specific methodology for Increasing the expression of the gene in a plant by removing 
the occurrence of particular DNA sequences. 

[0017] Therefore, It Is an object of the present invention to provide a method for preparing synthetic plant genes 
which express their respective proteins at relatively high levels when compared to wild-type genes. It is yet another 
object of the present invention to provide synthetic plant genes which express the crystal protein toxin of Bacillus 
thurlnglensis at relatively high levels. 

BRIEF DESCRIPTION OF THE DRAWINGS 

[0018] 

Figure 1 1llustrates the steps employed in modifying a wild-type gene to Increase expression efficiency in plants. 
Figure 2 Illustrates a comparison of the changes In the modified B.tk. HD-1 sequence of Example 1 (lower line) 
versus the wild-type sequence of B.tk. HD-1 which encodes the crystal protein toxin (upper line). 
Figure 3 illustrates a comparison of the changes in the synthetic B.tk. HD-1 sequence of Example 2 (lower line) 
versus the wild-type sequence of B.tk. HD-1 which encodes the crystal protein toxin (upper line). 
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Figure 4 Illustrates a comparison of the changes In the synthetic B.tX HD-73 sequence of Example 3 (lower line) 
versus the wild-type sequence of B.tX HD-73 (upper line). 

Figure 5 represents a plasmid map of Intermediate plant transformation vector cassette pMON893. 
Figure 6 represents a plasmid map of Intermediate plant transformation vector cassette pMON900. 
Figure 7 represents a map for the disarmed T-DNA of A tumefaciensACO. 

Figure 8 illustrates a comparison of the changes In the synthetic tru ncated B.tX HD-73 gene (Amino acids 29-61 5 

with an N-termlnal Met-Ala) of Example 3 (lower line) versus the wild-type sequence of B.tX HD-73 (upper line): 

Figure 9 Illustrates a comparison of the changes In the synthetlc/wlid-type full length B.tX HD-73 sequence of 

Example 3 (lower line) versus the wild-type full-length sequence of B.tX HD-73 (upper line). 

Figure 10 Illustrates a comparison of the changes in the synthetic/modified full length B.tX HD-73 sequence of 

Example 3 (lower line) versus the wild-type full-length sequence of B.tX HD-73 (upper line). 

Figure 1 1 illustrates a comparison of the changes In the fully synthetic full-length B.tk. HD-73 sequence of Example 

3 (lower line) versus the wild-type full-length sequence of B.tX HD-73 (upper line). 

Figure 12 Illustrates a comparison of the changes In the synthetic B.tt. sequence of Example 5 (lower line) versus 
the wild-type sequence of B.tt which encodes the crystal protein toxin (upper line). 
Figure 13 illustrates a comparison of the changes in the synthetic B.t P2 sequence of Example 6 (lower 
Figure 14 Illustrates a comparison of the changes In the synthetic B.t. entomocidus sequence of Example 7 (tower 
fine) versus the wild-type sequence of B. t entomocidus which encodes the Btent protein toxin (upper line). 
Figure 15 Illustrates a plasmid map for plant expression cassette vector pMON744. 

Figure 1 6 Illustrates a comparison of the changes in the synthetic potato leaf roll virus (PLRV) coat protein sequence 
of Example 9 (lower line) versus the wild-type coat protein sequence of PLRV (upper line). 

STATEMENT OF THE INVENTION 

[0019] The present invention provides a method for modifying a wild-type structural gene sequence which encodes 
an Insecticldal protein of Bacillus thuringiensls to enhance the expression of said protein In plants which comprises: 

a) Identifying regions within said sequence with greater than four consecutive adenine or thymine nucleotides; 

b) modifying the regions of step (a) which have two or more polyadenylatlon signals within a ten base sequence 
to remove said signals while maintaining a gene sequence which encodes said protein; and 

c) modifying the 15-30 base regions surrounding the regions of step (a) to remove major plant polyadenylatlon 
signals, consecutive sequences containing more than one minor polyadenylatlon signal and consecutive sequenc- 
es containing more than one ATTTA sequence while maintaining a gene sequence which encodes said protein. 

[0020] The Invention further provides a method for modifying a wild-type structural gene sequence which encodes 
an insecticldal protein of Bacillus thuringiensls to enhance the expression of said protein In plants which comprises: 

a) removing polyadenylatlon signals contained in said wild-type gene while retaining a sequence which encodes 
said protein; and 

b) removing ATTTA sequences contained in said wild-type gene while retaining a sequence which encodes said 
protein. 

[0021] According to a further embodiment a method for improving the expression of a heterologous gene in plante 
is provided, wherein said gene comprises a modified chimeric gene containing a promoter which functions in plant 
cells operably linked to a structural coding sequence and a 3' non-translated region containing a polyadenylatlon signal 
which functions In plants to cause the addition of polyadenylate nucleotides to the 3' end of the RNA, and wherein said 
structural coding sequence encodes an Insecticldal protein at least a portion of which was derived from a Bacillus 
thuringiensls protein, wherein said method comprises modifying said structural coding sequence so that said sequence 
has a DNA sequence which differs from the naturally occurring DNA sequence encoding said Bacillus thuringiensls 
protein and said structural coding sequence does not contain more than 5 consecutive nucleotides consisting of either 
adenine or thymine residues. 

[0022] As a further embodiment, a method for improving the expression of a heterologous gene In plants Is provided, 
wherein said gene comprises a modified chimeric gene containing a promoter which functions in plant cells operably 
linked to a structural coding sequence and a 3* non-translated region containing a polyadenylatlon signal which func- 
tions In plante to cause the addition of polyadenylate nucleotides to the 3' end of the RNA, wherein said structural 
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coding sequence encodes an Insectlcidal protein at least a portion of which was derived from a Bacillus thuringfensis 
protein, wherein said method comprises modifying said structural coding sequence so that said sequence has a DNA 
sequence which differs from the naturally occurring DNA sequence encoding said Bacillus thuringfensis protein and 
has the following characteristics: T- 

said structural coding sequence has a region which Is complementary to the following sequence: 

GGCTTGATTCCTAGCGAACTCTTCGATTCTCTGGTTGATGAGCTGTTC 
1 5 10 15 20 25 30 35 40 45 

said region In said coding sequence having eliminated 2 AACCAA and 1 AATTAA sequence. 

f 0 , 233 . J he ,f resent ,nventlon P rovWes a method for preparing synthetic plant genes which encode the crystal protein 
toxin of Baallus thuringiensis (at). Suitable at subspecies Include, but are not limited to, at kurstakl HD-1 at 
*ursfa*/HD-73, at sotto, at berliner, B.L thuringiensis, at tolworthl, at dendrolimus, B.t alestl, at galleriae Bt 
aizawal, at subtoxicus, at entomocfdus, B.t tenebrionis and at san dtego. 

[0 ° 2 , 4 L M he expresslon of at S enes ln P ,ants te Problematic. Although the expression of at genes In plants at In- 
sec cldalevels has been reported, this accomplishment has not been straightforward. In particular, the expression of 
a fu l-length lepidopteran specific at gene (comprising DNA from a at*. Isolate) has been reported to be unsuccessful 
In yielding Insectlcidal levels of expression in some plant species (vaeck et al., 1 987 and Barton et al 1987) 
[0025] It has been reported that expression of the full-length gene from at*. HD-1 was detectable in tomato plants 
but that truncated genes led to a higher frequency of Insectlcidal plants with an overall higher level of expression 
Truncated genes of at berliner also led to a higher frequency of I nsecticidal plants in tobacco (Vaeck et al., 1987)' 
On the other hand, Insectlcidal plants were provided from lettuce transformants using a full-length gene, 
[0026] it has also been reported that the full length gene from at*. HD-73 gave some Insectlcidal effect In tobacco 
(Adang et al., 1987). However, the at mRNA detected in these plants was only 1.7 kb compared to the expected 3 7 
kb Indicating improper expression of the gene. It was suggested that this truncated mRNA was too short to encode a 
functional truncated toxin, but there must have been a low level of longer mRNA In some plants or no Insectlcidal 
activity would have been observed. Others have reported in a publication that they observed a large amount of shorter 
than expected mRNA from a truncated at*, gene, but some mRNA of the expected size was also observed In fact 
it was suggested that expression of the full length gene is toxic to tobacco callus (Barton et al 1987) The above 
illustrates that lepidopteran type B. t genes are poorly expressed in plants compared to other chimeric genes previously 
expressed from the same promoter cassettes. . 
[0027] The expression of att In tomato and potato Is at levels similar to that of at*. (I.e., poor) B 1 1 and B t*. 
genes share only limited sequence homology, but they share many common features In terms of base composition 
and the presence of particular A+T rich elements. 

[0028] All reports In the field have noted the lower than expected expression of at genes in plants. In general 
insectlcidal efficacy has been measured using Insects very sensitive to at toxin such as tobacco hornwprm Although 
it has been possible to obtain plants totally protected against tobacco hornworm, It Is important to note that hornworm 
is up to 500 fold more sensitive to at toxin than some agronomlcally Important insect pests such as beet armyworm 
It Is therefore of Interest to obtain transgenic plants that are protected against all Important lepidopteran pests (or 
against Colorado potato beetle in the case of at tenebrionis), and In addition to have a level of at expression that 
provides an additional safety margin over and above the efficacious protection level. It is also Important to devise plant 
genes which function reproducibly from species to species, so that Insect resistant plants can be obtained In a predict- 
able fashion. 

[0029] In order to achieve these goals, It Is important to understand the nature of the poorer than expected expression 
of at genes in plants. The level of stable at mRNA In plants is much lower than expected. That Is, compared to other 
coding sequences driven by the same promoter, the level of at mRNA measured by Northern analysis or nuclease 
protection experiments is much lower. For example, tomato plant 337 (Fischhoff et al., 1 987) was selected as the best 
expressing plant with pMON9711 which contains the at*. HD-1 Kpnl fragment driven by the CaMV 35S promoter and 
contains the NOS-NPTII-NOS selectable marker gene. In this plant the level of at mRNA Is between 100 to 1000 fold 
lower than the level of NPTII mRNA, even though the 35S promoter Is approximately 50-fold stronger than the NOS 
promoter (Sanders etal., 1987). 

[0030] The level of at toxin protein detected In plants is consistent with the low level of at mRNA Moreover the 
Insectlcidal efficacy of the transgenic plants correlates with the at protein level Indicating that the toxin protein pro- 
duced in plants is biologically active. Therefore, the low level of at toxin expression may be the result of the low levels 
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of at mRNA. 



[0031] Messenger RNA levels are determined by the rate of synthesis and rate of degradation. It Is the balance 
LToTthfr^^ determ,nes the stead y state ,evel of mRN A- The rate of synthesis has been maximized by the 
sufh ll I V J r °V% 8 Str ° n9 P |ant expressible promoter. The use of other plant promoters 

mn R ft^K Sy **T (N ° S) ' mann0p!ne Synthase (MAS > and rlbu,ose b^phosphateoarboxylase small ™Z 
^ bave n f ,ed to , dramat,c cha "9 aa I" the levels of B.t toxin protein expression indicating that the effeote 
determining fif. toxin protein levels are promoter Independent These data Imply that the coding sequences of DNA 

f"Z e * " 9 , ftt !°f Pr0te,nS are S ° meh0W res P° ns ^le for the poor expression level, and thatttls eff?* h maT 
Ifested by a low level of accumulated stable mRNA. ' 

[0032] Lower than expected levels of mRNA have been observed with four different lepidopteran specific penes ftwo 

appears Jhat for lep,dopteran type B.t genes these effects are manifest more strongly in me full lengm coding sequel 
than n ^ he truncated coding sequences. These effects are seen across plant species although their magrLde see™ 
greater In some plant species such as tobacco. . u mB 

Em natUfe ° f the C ° dlng seQ - uences of Senes distinguishes them from plant genes as well as many other 

wM 1 9 7* 96neS ! XPre T d !n PlantS " ,n partiCU ' ar ' at 9enes are ver * rich <~ 62% ) ln a <^ne (A) and thyle m 
while plant genes and most bacterial genes which have been expressed In plants are on the order of 45-55% aTiS 

tten^h 22! ZHfTF (3nd thUS 96neS) ° f 3ny ° rganlSm 3re featUres of that °'9 an ^ «*' reflect Its evolu- 
onary history. While within any one organism genes have similar A+T content, the A+T content can vary tremendously 
from organism to organism. For example, some Bacillus species have among the most A+T rich genomes whTewme 
Steptomyces species are among the least A+T rich genomes (-30 to 35% A+T) 

[003fl Due to the degeneracy of the genetic code and the limited number of codon choices for any amino acid most 
^ ™™.m. , 6 StmC o ra ' COdlnQ se ^ uences of some B ™'"»* secies are found in the third position 'of the 

^ ;I h » ' 9 ? neS °l S ° me BaQ ' //w "P ecies have A or T as the third nucleotide In many codons. ThusA+T contem 
inwhtht™ 

\n whteh they evolve. This means that particular nucleotide sequences found In a gene from one organism, where they 
may p ay no role except to code for a particular stretch of amino acids, have the potential to be recognizees oe^e 
control elements in another organism (such as transcriptional promoters or terminators, polyA addition sites tntron 
—^.T^T degradaC ° n Signa,S> ' R ,S Perhaps surpris "* ** «"» mtere!d y slgna.s a^noTa more 
aItZ^h ^ w ' ° 90US 9 T expression ' but this can be part by the relative* homogeneous 

«l^32^ > °? ' ™ A+T C ° ntent P ' US the nature of the 9 enet,c «*»'P* clear constraints 

on the Ikliehood of occurence of any particular oligonucleotide sequence. Thus, a gene from £ coti with a 50% A+T 

£27 A m !f T 0"! 40 00,113,0 ^ PartiCU ' ar A+T rlGh Segment than a tam a rnu^nsfe +T 
vaHo?s mtdftn n hff ^' eXP K e ! S,0 u n ° f BX toX,n Pr ° tein ' n P,ants haS been P»bte"»fe Although the obser- 
n^fZTn 1 t ,k Sy mS d ! SCnbed ab ° Ve ° ffer the h0pe 0f a means to e,evate the expression level of B.t toxin 
™ p^"h\ 6 SUCCesS , 0btalned b * the P resen ' method is quite unexpected. Indeed, Inasmuch as It has been 
ZT V«T 1 aXpresslon of tne fulMen 9 th atfc toxin protein In tobacco makes callus tissue necrotic (Barton 
*We*to^^ e " 6XPeCt that h,9h teVe ' eXPreSS, ° n 01 Bt t0Xln Prote,n to be Enable due to the 
[00 3 q In its ^most rigorous application, the method of the present invention Involves the modification of an exlsdna 
structura codmg sequence (-structural gene") which codes for a particular protein by removal of ATTTA sequences 

TlZl , S ^ bstantia '^ al ' the Po'yadenylation slgnate and ATTTA sequences are removed although enhanced 
expression levels are observed with only partial removal of either of the above identified sequences. Alternately If a . 
synthet.0 gene is prepared which codes for the expression of the subject protein, codons are selected to avoiZ me 
SoZt ^rZ^ T PUtat,Ve P O, y aden y ,atl0n S| 9 na,s ' FQr Ptoses of the present Invention putative polyadenylatlon 
signals include, but are not necessarily limited to, AATAAA, AATAAT, AACCAA, ATATAA, AATCAA ATACTA ATAAAA. 
ATGAAA, AAGCAT, ATTAAT, ATACAT, AAAATA, ATT AAA, AATTAA, AATACA and CATAAA. In rep^n^ 
sequences and polyadenylatlon signals, codons are preferab* utilized which avoid the codons which are rare* ^bund 



to ?S mn^ a « embod,men of the P reser * inv ention, represented in the flow diagram of Figure 1 , emptoys a method 

is somewhat less rigorous than the method first described above. Referring to Figure 1, the selected DNA seauence 
s scanned to Wentify regtons with greater than four consecutive adenine (A) or thymine (T) S^^SiT? 
regions ^ scanned for potential plant polyadenylatlon signals. Although the absence of five or more ^consecutive ^A 
sLI^m!^ ^1 Pl3nt po| y aden y ,ation » »»» are more than one of the minor polyadenylatlon 

to remove these signals while maintaining the original encoded amino acid sequence 
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^"T? St6P ,S t0 COn8Wer the 15 10 30 nucteotWe regions surrounding the A+T rich region identified in 
nation 2l ? a,7 "I? m H the surroundIn 9 re 9 ,on ls les * t^n 80%, the region should be examined for polyade. 
nylatlon signals. Alteration of the region based on polyade nylatlon signals is dependent upon (1) the number of poly- 
adenylatlon signals present and (2) presence of a major plant polyadenylatlon signal. er or poiy 

eXt6m !t d r ^ 9l ° n h examlned for the P resence of plant polyadenylatlon signals. The polyadenylatlon - 
signals are removed by site-directed mutagenesis of the DNA sequence. The extended region is ateo examined Z 
multiple cop.es of the ATTTA sequence which are also removed by mutagenesis. 

[0040] It is also preferred that regions comprising many consecutive A+T bases or G+C bases are disrupted since 

^sertionofheterogeneous base pairs wouldreduce the likelihoodofself^mplementarysecondary structure formation 
which are known to inhibit transcription and/or translation In some organisms. In most cases, the adverse effect maj ■> 
be minimized by using sequences which do not contain more than five consecutive A+T or G+C. . 

SYNTHETIC OL IGONUCLEOTIDES FOR MUTAGENESIS . 

S . 1716 Nucleotides used in the mutagenesis are designed to maintain the proper amino acid sequence and 
jading frame and preferably to not Introduce common restriction sites such as Bglli, HindHI, Sad, Kpnl, EcoRI, Ncol 

m!L « Si ? f.^ J**™' Th6Se reStrlCti0n SlteS are found ,n multIlink *' "*ertion <* Zoning vectors 
«ufn " /J P ^ Cm P MON7258 ' ° f the introduction of new polyadenylatlon signals, ATTTA se- 

quences or consecutive stretches of more than five A+T or G+C, should also be avoided The preferred size for Z 
ol^ 

tei^TT T P ° h0m0! ° 9y 10 the temp,ate DNA cn both ends of the synthesized fragment are main- 

tained to insure proper hybridization of the primer 

thanflve base pairs A+T or G + C. Codons used in the replacement of wild-type codons should preferably avoid the TA 
1 ? *Z t bte ^e re ver possib.e. Codons are selected from a plant preferred codon table (such as Table I below) so 



Table I 



Preferred Codon Usage In Plants 


Amino Acid 


Codon 


Percent Usage in Plants 


ARG 


CGA 


7 




CGC 


11 




CGG 


5 




CGU 


25 




AGA 


29 




AGG 


23 


LEU 


CUA 


8 




cue 


20 




CUG 


10 




cuu 


28 




UUA 


5 




UUG 


30 


SER 


UCA 


14 




UCC 


26 




UCG 


3 




UCU 


21 




AGC 


21 




AGU 


15 


THR 


ACA 


21 




ACC 


41 
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Table > (continued) 



Preferred Codon Usage In Plants 


Amino Acid 


Codon 


Percent Usage In Plants 




ACG 


7 




ACU 


31 


PRO 


CCA 


45 




CCC 


19 




CCG 


9 




ecu 


26 


ALA 


GCA 


23 




GCC 


32 




GCG 


3 




GCU 


41 


GLY 


GGA 


32 




GGC 


20 




GGG 


11 




GGU 


37 j 


ILE 


AUA 


12 




AUC 


45 




AUU 


43 


VAL 


GUA 


9 




GUC 


20 




GUG 


28 




GUU 


43 


LYS 


AAA 


36 




AAG 


64 


ASN 


AAC 


72 




AAU 


28 


GLN 


CAA 


64 




CAG 


36 


HIS 


CAC 


65 




CAU 


35 


GLU 


GAA 


48 




GAG 


52 


ASP 


GAC 


48 




GAU 


52 


TYR 


UAC 


68 




UAU 


32 


CYS 


UGC 


78 
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Table 1 


(continued) 


Preferred Codon Usage In Plants 


Amino Acid 




Portent 1 Icano In PI si rite 
rciLcllL UaaytJ 111 r lalllo 




UGU 


22 


PHE 


UUC 


56 




UUU 


44 


MET 


AUG 


100 


TRP 


UGG 


100 



[0042] Regions with many consecutive A+T bases or G+C bases are predicted to have a higher likelihood to form 
15 hairpin structures due to self-complementarity. Disruption of these regions by the insertion of heterogeneous base 
pairs Is preferred and should reduce the likelihood of the formation of self-complementary secondary structures such 
as hairpins which are known In some organisms to inhibit transcription (transcriptional terminators) and translation 
(attenuators). However, it is difficult to predict the biological effect of a potential hairpin forming region. 
[0043] It Is evident to those skilled In the art that while the above description Is directed toward the modification of 
20 the DNA sequences of wild-type genes, the present method can be used to construct a completely synthetic gene for . 
a given amino acid sequence. Regions with five or more consecutive A+T or G+C nucleotides should be avoided. 
Codons should be selected avoiding the TA and CG doublets In codons whenever possible. Codon usage can be 
normalized against a plant preferred codon usage table (such as Table I) and the G+C content preferably adjusted to 
about 50%. The resulting sequence should be examined to ensure that there are minimal putative plant polyadenylatlon 
25 signals and ATTTA sequences. Restriction sites found In commonly used cloning vectors are also" preferably avoided. 
However, placement of several unique restriction sites throughout the gene is useful for analysis of gene expression 
or construction of gene variants. 

Plant Gene Construction 

30 

[0044] The expression of a plant gene which exists In double-stranded DNA form Involves transcription of messenger 
RNA (mRNA) from one strand of the DNA by RNA polymerase enzyme, and the subsequent processing of the mRNA 
primary transcript Inside the nucleus. This processing Involves a 3* non -translated region which adds polyadenylate 
nucleotides to the 3' end of the RNA. Transcription of DNA Into mRNA Is regulated by a region of DNA usually referred 
35 to as the "promoter." The promoter region contains a sequence of bases that signals RNA polymerase to associate 
with the DNA and to Initiate the transcription of mRNA using one of the DNA strands as a template to make a corre- 
sponding strand of RNA. 

[0045] A number of promoters which are active in plant cells have been described In the literature. These Include 
the nopaline synthase (NOS) and octoplne synthase (OCS) promoters (which are carried on tumor-inducing plasmids 

40 of Agrobacterium tumefaciens), the Cauliflower Mosaic Virus (CaMV) 1 9S and 35S promoters, the llght-tnduclble pro- 
moter from the small subunlt of ribulose bis-phosphate carboxylase (ssRUBISCO, a very abundant plant polypeptide) 
and the mannoplne synthase (MAS) promoter (Velten et al. 1984 and Velten & Schell, 1985). All of these promoters 
have been used to create various types of DNA constructs which have been expressed In plants (see e.g., PCT pub- 
lication WO84/029 13 (Rogers et al., Monsanto). 

45 [0046] Promoters which are known or are found to cause transcription of RNA in plant cells can be used in the present 
invention. Such promoters may be obtained from plants or plant viruses and include, but are not limited to, theCaMV35S 
promoter and promoters Isolated from plant genes such as ssRUBISCO genes. As described below, it is preferred that 
the particular promoter selected should be capable of causing sufficient expression to result In the production of an 
effective amount of protein. 

50 [0047] The promoters used In the DNA constructs (I.e. chimeric plant genes) of the present invention may be modified, 
if desired, to affect their control characteristics. For example, the CaMV35S promoter may be ligated to the portion of 
the ssRUBISCO gene that represses the expression of ssRUBISCO In the absence of light, to create a promoter which 
Is active In leaves but not in roots. The resulting chimeric promoter may be used as described herein. For purposes of 
this description, the phrase "CaMV35S" promoter thus Includes variations of CaMV35S promoter, e.g., promoters de- 

55 rived by means of ligation with operator regions, random or controlled mutagenesis, etc. Furthermore, the promoters 
may be altered to contain multiple "enhancer sequences" to assist in elevating gene expression. 
[0048] The RNA produced by a DNA construct of the present invention also contains a 5' non-translated leader 
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sequence. This sequence can be derived from the promoter selected to express the gene, and can be specifically 
modified so as to Increase translation of the mRNA. The 5' non-translated regions can also be obtained from viral 
: RNA's, from suitable eukaryotlc genes, or from a synthetic gene sequence. The present invention Is not limited to 
constructs, as presented In the following examples. Rather, the non-translated leader sequence can be part of the 5* 
s end of the non-translated region of the coding sequence for the virus coat protein, or part of the promoter sequence, 
or can be derived from an unrelated promoter or coding sequence .In any case, it Is preferred that the sequence flanking 
the initiation site conform to the translation^ consensus sequence rules for enhanced translation Initiation reported by 
Kozak (1984). 

[0049] The DNA construct of the present invention also contains a modified or fully-synthetic structural coding ae- 
ro quehce encoding the crystal toxin protein of Bacillus thuringiensis which has been changed to enhance the performance 
of the gene in plants. The structural genes of the present Invention may optionally encode a fusion protein comprising 
an amino-terminal chloroplast transit peptide or secretory signal sequence (see for Instance, Examples 10 and 11). 
[0050] The DNA construct also contains a 3' non-translated region. The 3' non-translated region contains a polya- 
denylatlon signal which functions In plants to cause the addition of polyadenylate nucleotides to the 3' end of the viral 
is RNA. Examples of suitable 3* regions are (1) the 3' transcribed, non-translated regions containing the polyadenylatlon 
signal of Agrobacterium tumor-inducing (Tl) plasmld genes, such as the nopaline synthase (NOS) gene, and (2) plant 
genes like the soybean storage protein (7S) genes and the small subunit of the RuBP carboxylase (E9) gene. An 
example of a preferred 3* region is that from the 7S gene, described in greater detail in the examples below. 

20 Plant Transformation 

[0051] A chimeric plant gene containing a structural coding sequence of the present Invention can be inserted Into 
the genome of a plant by any suitable method. Suitable plants for use In the practice of the present Invention Include, 
but are not limited to, soybean, cotton, alfalfa, oilseed rape, flax, tomato, sugarbeet, sunflower, potato, tobacco, maize, 

25 rice and wheat Suitable plant transformation vectors Include those derived from a Tt plasmld of Agrobacterium fume- 
faciens, as well as those disclosed, e.g., by Herrera-Estrella (1983), Bevan (1983), Klee (1985) and EPO publication 
1 20,51 6 (Schllperoortetal.). In addition to plant transformation vectors derived from the Tl or root-Inducing (Rl) plasmlds 
of Agrobacterium, alternative methods can be used to insert the DNA constructs of this invention lnto.plant cells. Such 
' methods may Involve, for example, the use of liposomes, electroporation, chemicals that increase free DNA uptake, 

so free DNA delivery via microprojectile bombardment, and transformation using viruses or pollen. 

[0052] A particularly useful Tl plasmld cassette vector for transformation of dicotyledonous plants Is shown in Figure 
5. Referring to Figure 5, the expression cassette pMON893 consists of the enhanced CaMV35S promoter (EN 35S) 
and the 3' end including polyadenylatlon signals from a soybean gene encoding the alpha-prime subunit of beta-con- 
glyclnln. Between these two elements Is a multllinker containing multiple restriction sites for the insertion of genes. 

35 [0053] The enhanced CaMV35S promoter was constructed as follows. A fragment of the CaMV35S promoter ex- 
tending between position -343 and +9 was previously constructed In pUC13 by Odell et al. (1985). This segment 
contains a region identified by Odell et al. (1 985) as being necessary for maximal expression of the CaMV35S promoter. 
It was excised as a Clal-Hindlll fragment, made blunt ended with DNA polymerase I (Klenow fragment) and Inserted 
into the Hindi site of pUC1 8. This upstream region of the 35S promoter was excised from this plasmld as a Hindlll- 

40 EcoRV fragment (extending from -343 to -90) and Inserted into the same plasmld between the Hlndlll and Pstl sites. 
The enhanced CaMV35S promoter thus contains a duplication of sequences between -343 and -90 (Kay et al., 1987). 
[0054] The 3' end of the 7S gene is derived from the 7S gene contained on the clone designated 17.1 (Schuler et 
al., 1982). This 3' end fragment, which includes the polyadenylatlon signals, extends from an Avail site located about. 
30 bp upstream of the termination codon for the beta-conglycinin gene in clone 17.1 to an EcoRI site located about 

45 450 bp downstream of this termination codon. 

[0055] The remainder of pMON893 contains a segment of pBR322 which provides an origin of replication In E. coll 
and a region for homologous recombination with the disarmed T-DNA in Agrobacterium strain ACO (described below); 
the orlV region from the broad host range plasmld RK1 ; the streptomycln/spectinomycln resistance gene from Tn7; 
and achlmeric NPTII gene, containing the CaMV35S promoter and the nopaline synthase (NOS) 3' end, which provides 

so kanamycln resistance in transformed plant cells. 

[0056] Referring to Figure 6, transformation vector plasmld pMON900 is a derivative of pMON893. The enhanced 
CaMV35S promoter of pMON893 has been replaced with the 1.5kb mannoplne synthase (MAS) promoter (Velten et 
al. 1 984). The other segments are the same as plasmid pMON893. After Incorporation of a DNA construct into plasmld 
vector pMON893 or pMON900, the Intermediate vector is Introduced into A. tumefaciens strain ACO which contains 

55 . a disarmed Ti plasmld. Colntegrate Tl plasmld vectors are selected and used to transform dicotyledonous plants. 
[0057] Referring to Figure 7, A. tumefaciens ACO Is a disarmed strain similar to pTIB6SE described by Fraley et al 
(1 985). For construction of ACO the starting Agrobacterium strain was the strain A208 which contains a nopallne-type 
Ti plasmld. The Tl plasmid was disarmed in a manner similar to that described by Fraley et al. (1 985) so that essentially 
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all of the native T-DNA was removed except for the left border and a few hundred base pairs of t-DNA inside the left 
border. The remainder of the T-DNA extending to a point just beyond the right border was replaced with a novel piece 
of DNA Including (from left to right) a segment of pBR322, the orlV region from plasmld RK2, and the kanamycln 
resistance gene from Tn601. The pBR322 and orlV segments are similar to the segments In pMON893 and provide a 
region of homology for colnteg rate formation. 

[0058] . The following examples are provided to better elucidate the practice of the present Invention and should not 
be interpreted In any way to limit the scope of the present Invention. Those skilled in the art will recognize that various . 
modifications, truncations etc. can be made to the methods and genes described herein while not departing from the 
spirit and scope of the present Invention. 

Example 1 - Modified B.tk. HD-1 Gene 

[0059] Referring to Figure 2, the wild-type B. Ik. HD-1 gene is known to be expressed poorly In plants as a full length 
gene or as a truncated gene. The G+C content of the BAX gene Is low (37%) containing many A+T rich regions, 
potential polyadenylation sites (1 8 sites; see Table II for the list of sequences) and numerous ATTTA sequences. 



Table II 

List of Sequences of the Potent ial 
Polyadenylation Signals 



AATAAA* 


AAGCAT 


AATAAT* 


ATTAAT 


AACCAA 


ATACAT 


ATATAA 


AAAATA 


AATCAA 


ATT AAA** 


ATACTA 


AATTAA** 


ATAAAA 


AATACA* * 


ATGAAA 


CAT AAA* * 



* indicates a potential major plant polyadenylation 
site. 

** indicates a potential minor animal polyadenylation 
site. 

All others are potential minor plant polyadenylation sites. 



[0060] Table III lists the synthetic oligonucleotides designed and synthesized for the site-directed mutagenesis of 
the BXk. HD-1 gene. 
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Table III 

Mutagenesis PHm*™ for » r.y hd-i ^n»> 



Primer 
BTKX85 
BTK240 

BTK462 

BTK669 

BTK930 
BTK1110 



Length QagJ 
18 
48 

54 

48 



39 



32 



Sequence 

TCCCCAGATA ATATCAAC 

GGCTTGATTC CTAGCGAACT 
CTTCGATTCT CTGGTTGATG 
AGCTGTTC 
• 

CAAAACTGAG AGGTGGAGGT 
TGGCAGCTTG AACGTACACG 
GAGAGGAGAGGAAC 

AGTTAGTGTA AGCTCTCTTC 
TGAACTGGTT: GTACCTGATC 
CAATCTCT 

AGCCATGATC TGGTGACCGG 
ACCAGTAGTA TTCTCCTCT 

AGTTGTTGGT TGTTGATCCC 
GATGTTAAAA GG 
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Table IH - continued 

Mutagenesis Primers for B.t.ft. HD-1 Gene 
Primer Length (bp). Sequence 

BTK1380A 37 GTGATGAAGG GATGATGTTG 

TTGAACTCAG CACTACG 

BTK1380T 100 CAGAAGTTCC AGAGCCAAGA 

TTAGTAGACT TGGTGAGTGG 
GATTTGGGTG ATTTGTGATG . 
AAGGGATGAT GTTGTTGAAC 
TCAGCACTAC GATGTATCCA 

BTK1600 27 TGATGTGTGG AACTGAAGGT 

TTGTGGT 

[0061] The B.t.k. HD-1 gene (Bglll fragment from pMON9921 encoding amino acids 29-607 with a Met-Ala at the N- 
terminus) was cloned into pMON7258 (pUC11 8 derivative which contains a Bglll site in the multilinker cloning region) 
at the Bglll site resulting In pMON5342. The orientation of the B.t.k. gene was chosen so that the opposite strand 
(negative strand) was synthesized In filamentous phage particles for the mutagenesis. The procedure of Kunkle (1985) 
was used for the mutagenesis using plasmid pMON5342 as starting material. 

[0062] The regions for mutagenesis were selected in the following manner. All regions of the DNA sequence of the . 
B.t.k. gene were identified which contained five or more consecutive base pairs which were A or T. These were ranked 
In terms of length and highest percentage of A+T In the surrounding sequence over a 20-30 base pair region. The DNA 
was then analysed for regions which might contain polyadenylatlon sites (see Table II above) or ATTTA sequences. 
Oligonucleotides were designed which maximized the elimination of A+T consecutive regions which contained one or 
more polyadenylation sites or ATTTA sequences. Two potential plant polyadenylatlon sites were rated more critical 
(see Table II) based on published reports. Codons were selected which increased G+C content, did not generate 
restriction sites for enzymes useful for cloning and assembly of the modified gene (BamHl, Bglll, Sacl, Ncol, EcoRV) 
and did not contain the doublets TA or GC which have been reported to be Infrequently found in codons In plants. The 
oligonucleotides were at least 18 bp long ranging up to 100 base pairs and contained at least 5-8 base pairs of direct 
homology to native sequences at the ends of the fragments for efficient hybridization and priming in site-directed mu- 
tagenesis reactions. Figure 2 compares the wild-type B.t.k. HD-1 gene sequence with the sequence which resulted 
from the modifications by site-directed mutagenesis. 

[0063] The end result of these changes was to Increase the G+C content of B.tk. gene from 37% to 41 % while also 
decreasing the potential plant polyadenylatlon sites from 18 to 7 and decreasing the ATTTA regions from 13 to 7. 
Specifically, the mutagenesis changes from amino (57 terminus to the carboxy (3') terminus are as follows: 
[0064] BTK1 85 is an 1 8-mer used to eliminate a plant polyadenylatlon site in the midst of a nine base pair region of 
A+T. 

[0065] BTK240 Is a 48-mer. Seven base pairs were changed by this oligonucleotide to eliminate three potential 
polyadenylatlon sites (2 AACCAA, 1 AATTAA). Another region close to the region altered by BTK240, starting at bp 
312, had a high A+T content (13 of 15 base pairs) and an ATTTA region. However, It did not contain a potential poly- 
adenylation site and its longest string of uninterrupted A+T was seven base pairs. 

[0066] BTK462 is a 54-mer introducing 1 3 base pair changes. The first six changes were to reduce the A+T richness, 
of the gene by replacing wild-type codons with codons containing G and C while avoiding the CG doublet The next 
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seven changes made by BTK462 were used to eliminate an A+T rich region (1 3 of 1 4 base pairs were A or T) containing 
two ATTTA regions. 

[0067] BTK669 Is a 48-mer making nine Individual base pair changes eliminating three possible polyadenylatton 

sites (ATATAA, AATCAA, and AATTAA) and a single ATTTA site. . 
5 [0068] BTK930 Is a 39-mer designed to Increase the G+C content and to eliminate a potential polyadenylatton site 

(AATAAT - a major site). This region did contain a nine base pair region of consecutive A+T sequence. One of the base . 

pair changes was a G to A because a G at this position would have created a G+C rich region (CCGG(G)C). Since 

sequencing reactions Indicate that there can be difficulties generating sequence through G+C consecutive bases, It 

was thought to be prudent to avoid generating potentially problematic regions even If they were problematic only In vitro. 
io [0069] BTK1110 Is a 32-mer designed to Introduce five changes In the wild-type gene. One potential site (AATAAT 

- a major site) was eliminated In the midst of an A+T rich region (1 9 of 22 base pairs). 

[0070] BTK1380A and BTK1 380T are responsible for 14 Individual base pair changes. The first region (1380A) has 
17 consecutive A+T base pairs. In this region is an ATTTA and a potential polyadenylation site (AATAAT). The 1 00-mw 
(1380T) contains all the changes dictated by 1 380A. The large size of this primer was In part an experiment to determine 

15 if it was feasible to utilize large oligonucleotides for mutagenesis (over 60 bases In length). A second consideration 
was that the 100-mer was used to mutagenlze a template which had previously been mutagenelzed by 1380A. The 
original primer ordered to mutagenlze the region downstream and adjacent to 1 380A did not anneal efficiently to the 
desired site as indicated by an inability to obtain clean sequence utilizing the primer. The large region of homology of 
. 1380T did assure proper annealing. The extended size of 1380T was more of a convenience rather than a necessity. 

20 '. The second region adjacent to 1 380A covered by 1 380T has a high A+T content (22 of 29 bases are A or T). 

[0071] BTK1 600 Is a 27-mer responsible for five Individual base pair changes. An ATTTA region and a plant polya- 
denylatton site were identified and the appropriate changes engineered. 

[0072] A total of 62 bases were changed by site-directed mutagenesis. The G+C content Increased by 55 base pairs, 
the potential polyadenylation sites were reduced from 18 to seven and the ATTTA sequences decreased from 13 to 
25 seven. The changes In the DNA sequence resulted In changes In 55 of the 579 codons In the truncated B.tX gene In 
pMON5342 (approximately 9.5%). 

[0073] Referring to Table IV modified R Ik. HD-1 genes were constructed that contained all of the above modifications 
(pMON5370) or various subsets of Individual modifications. These genes were Inserted ' Into pMON893 for plant trans- 
■ formation and tobacco plants containing these genes were analyzed. The analysis of tobacco plants with the individual 

30 modifications was undertaken for several reasons. Expression of the wild type truncated gene In tobacco is very poor, 
resulting In Infrequent Identification of plants toxic to THW. Toxicity Is defined by leaf feeding assays as at least 60% 
mortality of tobacco hornworm neonate larvae with a damage rating of 1 or less (scale is 0 to 4; 0 Is equivalent to total 
protection, 4 total damage). The modified HD-1 gene (pMON5370) shows a large Increase In expression (estimated 
to be approximately 100-fold; see Table VIII) in tobacco. Therefore, Increases in expression of the wild-type gene due 

35 to indidvidual modifications would be apparently a large increase In the frequency of toxic tobacco plants and the 
presence of detectable B.t.k. protein. Results are shown In the following table: 



Table IV 



55 



Relative effects of Regional Modifications within the B.f.fc Gene 


Construct 


Position Modified 


# of Plants 


# of Toxic Plants 


PMON5370 


185, 240, 669, 930, 1110, 1380a+b, 1600 


38 


22 


pMON10707 


185,240,462,669 


48 


19 


pMON10706 


930, 1110, 1380a+b, 1600 


43 


1 


pMON10539 


185 


55 


2 


pMON 10537 


240 


57 


17 


pMON1054O 


185,240 


88 


23 


pMON10705 


482 


47 


1 



[0074] The effects of each Individual oligonucleotides' changes on expression did reveal some overall trends. Six 
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different constructs were generated which were designed to Identify the key regions. The nine different oligonucleotides 
were divided in half by their position on the gene. Changes In the N-termlnal half were Incorporated Into pMON10707 
(185,240, 4-62,669). C-termlnal half changes were Incorporated Into pMON10706 (930,11 10,1 380a+b,1600). The re- 
sults of analysis of plants with these two constructs Indicate that pMON10707 produces a substantial number of toxic 

5 plants (1 9 of 48). Protein from these plants Is detectable by ELISA analysis. pMON1 0706 plants were rarely Identified 
as insecticldal (1 of 43) and the levels of BAX were barely detectable by Immunological analysis. Investigation of the : 
N-termlnal changes in greater detail was done with 4 pMON constructs; 10539 (185 alone), 10537 (240 alone), 10540 
(185 and 240) and 10705 (462 alone). The results Indicate that the presence of the changes In 240 were required to 
generate a substantial number of toxic plants (pMON10540; 23 of 88, pMON10537; 17 of 57). The absence of the 240 

10 changes resulted In a low frequency of toxic plants with low at A. protein levels, Identical to results with the wild type 
gene. These results indicate that the changes In 240 are responsible for a substantial increase In BAX expression 
levels over an analogous wild-type construct in tobacco. Changes In additional regions (185,462,669) in conjunction 
with 240 may result In Increases In B.tk. expression (>2 fold). However, changes at the 240 region of the N-termlnal 
portion of the gene do result in dramatic Increases In expression. 

is [0075] Despite the Importance of the alteration of the 240 region In expression of modified genes, Increased expres- 
slon can be achieved by alteration of other regions. Hybrid genes, part wild-type, part synthetic, were generated to 
determine the effects of synthetic gene segments on the levels of BAM, expression. A hybrid gene was generated with 
a synthetic N-terminal third (base pair 1 to 590 of Figure 2: to the Xbal site) with the C-termlnal wild type BAX HD-1 
(pMON5378) Plants transformed with this vector were as toxic as plants transformed with the modified HD-1 gene 

zo (pMON5370). This Is consistent with the alteration of the 240 region. However, pMON1 0538, a hybrid with a wild-type 
N-termlnal third (wild type gene for the first 600 base pairs, to the second Xbal site) and a synthetic C-termlnal last 
two-thirds (base pair 590 to 1845 of Figure 3 was used to transform tobacco and resulted In a dramatic increase In 
expression. The levels of expression do not appear to be as high as those seen with the synthetic gene, but are 
comparable to the modified gene levels. These results Indicate that modification of the 240 segment is not essential 

25 to Increased expression since pMON1 0538 has an Intact 240 region. A fully synthetic gene Is, in most cases, superior 
for expression levels of B. tk. (See Example 2.) 

Example 2 - Fully Synthetic BAX HD-1 Gene 

so [0076] A synthetic BAX HD-1 gene was designed using the preferred plant codons listed In Table V below. Table V 
lists the codons and frequency of use In plant genes of dicotyledonous plants compared to the frequency of their use 
In the wild type B.tk. HD-1 gene (amino acids 1 -61 5) and the synthetic gene of this example. The total number of each 
amino acid in this segment of the gene Is listed In the parenthesis under the amino acid designated. 



Table V 



55 



Codon in Usage Synthetic B.tX HD-1 Gene 


Amino Acid 


Codon 


Percent Usage in Plants/Wt fi.tfc/Syn 


ARG 


CGA 


■ 7 


11 


2 


(43) 


CGC 


11 


5 


5 




CGG 


5 


2 


0 




CGU 


25 


14 


27 




AGA 


29 


55 


41 




AGG 


23 


14 


25 


LE 


CUA 


8 


16 


4 


(49) 


cue 


20 


0 


20 




CUG 


10 


2 


6 




CUU 


28 


22 


24 




UUA 


5 


50 


0 




UUG 


30 


10 


45 



17 
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Table V (continued) 



Codon In Usage Synthetic B.tk HD-1 Gene 


Amino Acid 


Codon 


Percent Usage in Plants/Wt Rtfc/Syn 


SER 


UCA 


14 . 


27 


5 




ucc 


26 


9 


28 




UCG 


3 


8 


0 




UCU 


21 


19 


31 




AGC 


21 


6 


32 




AGU 


15 


31 


5 


THR 


ACA 


21 


31 


14 




ACC 


41 


19 


53 




ACG 


7 ' 


14 


0 




ACU 


31 


36 


33 


PRO 


CCA 


45 


35 


53 




CCC 


19 


6 


12 




COG 


9 


21 


3 




ecu 


26 


38 


32 


ALA 


GCA 


23 


38 


26 




GCC 


32 


9 


29 




GCG 


3 


3 


0 




GCU 


41 


50 


45 


GLY 


GGA 


32 


52 


45 


(46) 


GGC 


20 


17 


15 




GGG 


11 


15 


6 




GGU 


37 


15 


34 


IUE 


AUA 


12 


39 


2 


(48) 


AUC 


45 


11 


67 




AUU 


43 


50 


30 


VAL 


GUA 


9 


45 


3 


(38) 


GUC 


20 


5 


16 




GUG 


28 


11 


37 




GUU 


43 


39 


45 


LYS 


AAA 


38 


100 


33 


(3) 


AAG 


64 


0 


67 


ASN 


AAC 


72 


27 


80 


(44) 


AAU 


28 


73 


20 



18 
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Table V (continued) 





Codon in Usage Synthetic B.tX HD-1 Gene 




Amino Acid 


Codon 


Percent Usage in Plants/Wt B.tfc/Syn 


5 














GLN 


CAA 


64 


77 


61 




(31) 


CAG 


38 


23 


39 | 


10 


HIS 


CAC 


65 


0 


80 




(10) 


CAU 


35 


100 


20 


15 


GLU 
(30) 


GAA 
GAG 


48 

52 


87 
13 


50 
50 




ASP 


GAC 
GAU 


48 

52 


17 
83 


65 
35 




TYR 


UAC 
UAU 


68 

32 


20 
80 


72 
28 


25 ■■ ■ 


CYS 


UGC 


78 


50 


100 




(2) 


UGU 


22 


50 


0 


30 


PHE 
(36) 


uuc 
uuu 


56 
44 


17 

83 | 


83 
17 




MET 

(9) 


AUG 


100 


100 


100 


35 


TRP 

(9) 


UGG 


100 


100 


100 



[0077] The resulting synthetic gene lacks ATTTA sequences, contains only one potential polyadenylatlon site and 
has a G+C content of 48.5%. Figure 3 Is a comparison of the wild-type HD-1 sequence to the synthetic gene sequence 
for amino acids 1-615. There Is approximately 77% DNA homology between the synthetic gene and the wild-type gene 
and 356 of the 61 5 codons have been changed (approximately 60%). - 

Example 3 - Synthetic B.tk. HD-73 Gene 

[0078] The crystal protein toxin from B.tk. HD-73 exhibits a higher unit activity against some Important agricultural 
pests. The toxin protein of HD-1 and HD-73 exhibit substantial homology (-90%) In the N-termlnal 450 amino acids, 
but differ substantially In the amino acid region 451-615. Fusion proteins comprising amino acids 1-450 of HD-1 and 
451 -61 5 of HD-73 exhibit the Insecticldal properties of the wild-type HD-73. The strategy employed was to use the 5'- 
two thirds of the synthetic HD-1 gene (first 1350 bases, up to the Sacl site) and to dramatically modify the final 590 
bases (through amino acid 645) of the HD-73 in a manner consistent with the algorithm used to design the synthetic 
HD-1 gene. Table VI below lists the oligonucleotides used to modify the HD-73 gene in the order used In the gene from 
5' to 3' end. Nine oligonucleotides were used in a 590 base pair region, each nucleotide ranging In size from 33 to 60 
bases. The only regions left unchanged were areas where there were no long consecutive strings of A or T bases 
(longer than six). All polyadenylatlon sites and ATTTA sites were eliminated. 
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Primer 
73K1363 



Table VI 

Mutagenesis Primer a for n.tit 
Length (bp) Sequence 



51 



AATACTATCG GATGCGATGA 
TGTTGTTGAA CTCAGCACTA 
CGGTGTATCC A 



73K1437 



73K1471 



33 



48 



TCCTGAAATG ACAGAACCGT 
TGAAGAGAAA GTT 

ATTTCCACTG CTGTTGAGTC 
TAACGAGGTC TCCACCAGTG 
AATCCTGG 



73K1561 



73K1642 



73K1675 



60 



33 



48 



GTGAATAGGG GTCACAGAAG 
CATACCTCAC ACGAACTCTA 
TATCTGGTAG ATGTTGGATGG 

TGTAGCTGGA ACTGTATTGG 
AGAAGATGGA TGA 

TTCAAAGTAA CCGAAATCGC 
TGGATTGGAG ATTATCCAAG 
GAGGTAGC 



73K1741 



39 



ACTAAAGTTT CTAACACCCA 
CGATGTTACC GAGTGAAGA 
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Table VI - continued 
Mutagenesis Primers for B.C. ft. HD-73 
, Primer Length (bp), Sequence 

73K1797 ' 36 AACTGGAATG 

TGTCGATAAT 

73KTERM 54 GGACACTAGA 

AATCGGTCAC 
AGTCCAAGCT 

[0079] The resulting gene has two potential polyadenylatlon sites (compared to 18 in the WT) and no ATTTA se- 
quence (12 in the WT). The G+C content has increased from 37% to 48%. A total of 59 Individual base pair changes 
were made using the primers In Table VI. Overall, there is 90% DNA homology between the region of the HD-73 gene 
modified by site directed mutagenesis and the wild- type sequence of the analogous region of HD-73. The synthetic 
HD-73 is a hybrid of the first 1360 bases from the synthetic HD-1 and the next 590 bases or so modified HD-73 se- 
quence. Figure 4 Is a comparison of the above-described synthetic B.tk. HD-73 and the wild-type Rt.fc HD-73 encoding 
amino acids 1-645. In the modified region of the HD-73 gene 44 of the 170 codons (25%) were changed as a result of 
the site-directed mutagenesis changes resulting from the oligonucleotides found in Table VI. Overall, approximately 
50% of the codons in the synthetic B.tk. HD-73 differ from the analogous segment of the wild-type and HD-73 gene. 
[0080] A one base pair deletion in the synthetic HD-73 gene was detected in the course of sequencing the 3' end at 
base pair 1890. This results in a frame-shift mutation at amino acid 625 with a premature stop codon at amino acid 
640 (pMON5379). Table VII below compares the codon usage of the wild-type gene of B.tk. HD-73 versus the synthetic 
gene of this example for amino acids 451 -645 and codon usage of naturally occurring genes of dicotyledonous plants. 
The total number of each amino acid encoded in this segment of the gene is found In the parentheses under the amino 
acid designation. 



Table VII 



Codon Usage In Synthetic B.tk. HD-73 Gene 


Amino Acid 


Codon 


Percent Usage In Plants/Wt HD-73/Syn 


ARG 


CGA 


7 


10 


0 


(10) 


CGC 


11 


0 


8 




CGG 


5 


10 


0 




CGU 


25 


20 


23 




AGA 


29 


60 


62 




AGG 


23 


0 


8 



AACTCGAATC 
CACTCC 

TCTTAGTGAT 
ATTTGTCTTG 
GGTT 
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Table VII (continued) 



Codon Usage 


In Synthetic B.tk. H D-73 Gene 


Amino Acid 


Codon 


Percent Usage in PlantsAWt HD-73/Syn 


LEU 


CUA 


8 


25 


.8 


(12) 


cue 


20 


^7 






CUG 


10 


17 


8 




CUU 


CO 


8 


0 




UUA 


e . 
O 


33' 


8 




UUu 


30 


0 


17 


SER 


UCA 


14 


24 


18 


(21) 


UCC 


26 


10 


27 




UCG 


3 


10 


0 




UCU ■ 


21 


24 


18 




AGC 


21 


o 


14 




AGU 


15 


33 


23 


THR 


ACA 


21 


47 


38 


(15) 


ACC 


41 


13 


31 




ACG 


7 


13 


0 




ACU 


31 


27 


31 


PRO 


CCA 


45 


71 


71 


(7) 


ccc 


19 


0 


0 




CCG 


9 


14 


0 




ecu 


26 


14 


29 


ALA 


GCA 


23 


29 


31 


(14) 


GCC 


32 


7 


8 




GCG 


3 


21 


15 




GCU 


41 


43 


46 


GLY 


GGA 


32 


33 


43 


(15) 


GGC 


20 


0 


0 




GGG 


11 


27 


14 




GGU 


37 


40 


43 


ILE 


AUA 


12 


33 


7 


(15) 


AUC 


45 


7 


40 




AUU 


43 


60 


53 
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Table VII (continued) 



Codon Usage In Synthetic B.tk. HD-73 Gene 


Amino Acid 


Codon 


Percent Usage In Plants/Wt HD-73/Syn 


VAL 


GUA 


9 


40 


7 


(15) 


GUC 


20 


0 


7 




GUG 


28 


20 


36 




GUU 


43 


40 


50 


LYS 


AAA 


36 


67 


100 


(3) 


AAG 


OA 


33 


o 


ASN 


AAC 


72 


20 


53 


(20) 


AAU 


28 


an - 
oU 


47 


GIN 


CAA 


64 


60 


67 


(5) 


CAG 


36 


40 


■94 
OO 


HIS 


CAC 


65 


67 


100 


(3) 


CAU 


35 


33 


o 


GLU 


GAA 


48 


86 


57 


00 


GAG 


52 


14 


43 


ASP 


GAC 


48 


40 


50 


(5) 


GAU 


52 


60 


50 


TYR 


UAC 


68 


o 


20 


(5) 


UAU 


32 


100 


80 


CYS 


UGC 


78 


o 


0 


(0) 


UGU 


22 


0 


0 


PHE 


UUC 


56 


8 


67 


(13) 


UUU 


44 


92 


33 


MET 


AUG 


100 


100 


100 


(2) 










TRP 


UGG 


100 


100 


100 


<2) 











[0081] Another truncated synthetic HD-73 gene was constructed. The sequence of this synthetic HD-73 gene Is 
Identical to that of the above synthetic HD-73 gene in the region In which they overlap (amino acids 29-615), and It 
also encodes Met-Ala at the N-terminus. Figure 8 shows a comparison of this truncated synthetic HD-73 gene with the 
N-termlnal Met-Ala versus the wild-type HD-73 gene. 

[0082] While the previous examples have been directed at the preparation of synthetic and modified genes encoding 
truncated B.tk. proteins, synthetic or modified genes can also be prepared which encode full length toxin proteins. 
[0083] One full length B.tk. gene consists of the synthetic HD-73 sequence of Figure 4 from nucleotide 1 -1845 plus 
wild-type HD-73 sequence encoding amino acids 616 to the C-terminus of the native protein. Figure 9 shows a com- 
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parison of this synthetic/wild-type full length HD-73 gene versus the wild-type full length HD-73 gene. 
[0084] Another full length B.tk. gene consists of the synthetic HD-73 sequence of Figure 4 from nucleotide 1 -1845 
plus a modified HD-73 sequence ending amino acids 616 to the C-termlnus of the native protein. The C-termlnal portion 
has been modified by site-directed mutagenesis to remove putative polyadenylatlon signals and ATTTA sequences 
5 according to the algorithm of Figure 1 . Figure 1 0 shows a comparison of this synthetic/modified full length HD-73 gene 
versus the wild-type full length HD-73 gene. 

[0085] Another full length B. Ik. gene consists of a fully synthetic HD-73 sequence which Incorporates the synthetic 
HD-73 sequence of Figure 4 from nucleotide 1-1845 plus a synthetic sequence encoding amino acids 616 to the C- 
terminus of the native protein. The C-terminal synthetic portion has been designed to eliminate putative polyadenylatlon 
10 signals and ATTTA sequences and to Include plant preferred codons. Figure 1 1 shows a comparison of this fully syn-/ 
thetlc full length HD-73 gene versus the wild-type full length HD-73 gene. 

[0086] Alternatively, another full length B.tk. gene consists of a fully, synthetic sequence comprising base pairs 
1-1830 of B.t.k. HD-1 (Figure 3) and base pairs 1834-3534 of B.tk. HD-73 (Figure 11). 

is Example 4 - Expression of Modified and Synthetic B.t.k. HD-1 and Synthetic HD-73 

[0087] A number of plant transformation vectors for the expression of B.t.k. genes were constructed by Incorporating 
the structural coding sequences of the previously described genes into plant transformation cassette vector pMON893. 
The respective intermediate transformation vector Is inserted Into a suitable disarmed Agrobacterium vector such as 
20 A tumefaclens ACO, supra. Tissue explants are cocultured with the disarmed Agrobacterium vector and plants regen- 
erated under selection for kanamycln resistance using known protocols: tobacco (Horsch et al., 1985); tomato (Mc- 
Cormick et al., 1 986) and cotton (Trolinder et al., 1 987). 

a) Tobacco. 

25 

[0088] The level of B.tk. HO-1 protein In transgenic tobacco plants containing pMON9921 (wild type truncated), 
pMON5370 (modified HD-1 , Example 1 , Figure 2) and pMON5377 (synthetic HD-1 , Example 2, Figure 3) were analyzed 
by Western analysis. Leaf tissue was frozen in liquid nitrogen, ground to a fine powder and then ground in a 1:2 (wt 
volume) of SDS-PAGE sample buffer. Samples were frozen on dry Ice, then Incubated for 10 minutes In a boiling water 

30 bath and microfuged for 10 minutes. The protein concentration of the supernatant was determined by the method of 
Bradford (Anal. Biochem. 72:248-254). Fifty ug of protein was run per lane on 9% SDS-PAGE gels, the protein trans- 
ferred to nitrocellulose and the B.t.k. HD-1 protein visualized using antibodies produced against B.tk. HD-1 protein as 
the primary antibody and alkaline phosphatase conjugated second antibody as described by the manufacturer (Prome- 
ga, Madison, Wl). Purified HD-1 tryptic fragment was used as the control. Whereas the B.tk. protein from tobacco 

35 plants containing pMON9921 was below the level of detection, the B.tk. protein from plants containing the modified 
(pMON5370) and synthetic (pMON5377) genes was easily detected. The B.tk. protein from plants containing 
pMON9921 remained undetectable, even with 1 0 fold longer Incubation times. The relative levels of Btk. HD-1 protein 
In these plants Is estimated in Table VIII. Because the protein from plants containing pMON9921 was not observed, 
the level of protein In these plants was estimated from the relative mRNA levels (see below). Plants containing the 

40 modified gene (pMON5370) expressed approximately 100 fold more B.tk. protein than plants containing the wild-type 
gene (pMON9921 ). Plants containing the fully synthetic B.tk. HD-1 gene (pMON5377) expressed approximately five 
fold more protein than plants containing the modified gene. The modified gene contributes the majority of the Increase 
In B.tk. expression observed. The plants used to generate the above data are the best representatives from each 
construct based either on a tobacco hornworm bloassay or on data derived from previous Western analysis. 

45 

Table VIII 



Expression of B.tk. HD-1 Protein in Transgenic Tobacco 


Gene Description 


Vector 


B.tk. Protein* Concentration 


Fold Increase In B.tk. 






Expression 


Wild type 


pMON9921 


10 


1 


Modified 


pMON5370 


1000 


100 


Synthetic 


PMON5377 


5000 


500 



* B.tk. protein concentrations are expressed In ng/mg of total soluble protein. The level of B.tX protein for plants containing the wDd type gem are 
estimated from mRNA levels. 



[0089] Plants containing these genes were tested for bloactlvity to determine whether the increased quantities of 
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protein observed by Western analysis result in a corresponding increase in bioactMty. Leaves from the same plants 
used for the Western data in Table 1 were tested for bioactivlty against two insects. A detached leaf bloassay was first 
done using tobacco hornworm, an extremely sensitive lepldopteran Insect Leaves from all three transgenic tobacco 
plants were totally protected and 100% mortality of tobacco hornworm observed (see Table IX below). A much less 
sensitive Insect, beet armyworm, was then used In another detached leaf bloassay. Beet armyworm is approximately 
500 fold less sensitive to B.tk. HD-1 protein than tobacco hornworm. The difference in sensitivity of these two Insects . 
was determined using purified HD-1 protein In a diet Incorporation assay (see below). Plants containing the wild-type 
gene (pMON9921) showed only minimal protection against beet armyworm, whereas plants containing the modified 
gene showed almost complete protection and plants containing the fully synthetic gene were totally protected against 
beet armyworm damage. The results of these bioassays confirm the levels of B.tk. HD-1 expression observed in the 
Western analysis and demonstrates that the Increased levels of B.tk. HD-1 protein correlates with increased I nsectteJdal 
activity. 



Table DC 



. Protection of Tobacco Plants from Tobacco Hornworm and Beet Armyworm 


Gene Description 


Vector 


Tobacco Hornworm Damage* 


Beet Armyworm Damage* 


None 


None 


NL 


NL 


Wild type 


pMON9921 


. 0 


3 


Modified 


pMON5370 


0 


1 


Synthetic 


pMON5377 


o 


0 



* Extent at Insect damage was rated: 0, no damage; 1 . slight; 2, moderate; 3, severe; or NL, no leaf left. 



[0090] The bloacth/ity of the B.tk. HD-1 protein produced by these transgenic plants was further Investigated to more 
accurately quantltate the relative activities. Leaf tissue from tobacco plants containing the wild-type, modified and 
synthetic genes were ground in 100 mM sodium carbonate buffer, pH 10 at a 1:2 (wt:vol) ratio. Particulate material 
was removed by centrifugatton. The supernatant was incorporated Into a synthetic diet similar to that described by 
Marrone et al. (1985). The diet medium was prepared the day of the test with the plant extract solutions incorporated 
in place of the 20% water component One ml of the diet was allquoted Into 96 well plates. 

[0091] After the diet dried, one neonate tobacco budworm larva was added to each well. Sixteen Insects were tested 
with each plant sample. The plants were incubated at 27°C. After seven days, the larvae from each treatment were 
combined and weighed on an analytical balance. The average weight per Insect was calculated and compared to a 
standard curve relating B.tk. protein concentrations to average larval weight Insect weight was inversely proportional 
(in a logarithmic manner) to the relative Increase In B.tk. protein concentration. The amount of B.tk. HD-1 protein, 
based on the extent of larval growth inhibition was determined for two different plants containing each of the three 
genes. The specific activity (ng of B.tk. HD-1 per mg of plant protein) was determined for each plant Plants containing 
the modified HD-1 gene (pMON5370) averaged approximately 1400 ng (1200 and 1600 ng) of B.tk. HD-1 per mg of 
plant extract protein. This value compares closely with the 1 000 ng of B.tk. HD-1 protein per mg of plant extract protein 
as determined by Western analysis (Table I). B.tk. HD-1 concentrations for the plants containing the synthetic HD-1 
gene averaged approximately 8200 ng (7200 and 9200 ng) of B.tk. HD-1 protein per mg of plant extract protein. This 
number compares well to the 5000 ng of HD-1 protein per mg of plant extract protein estimated by Western analysis. 
Likewise, plants containing the synthetic gene showed approximately a six-fold higher specific activity than the corre- 
sponding plants containing the modified gene for these bioassays. In the Western analysis the ratio was approximately 
10 fold, again both are In good agreement The level of B.tk. protein in plants containing the wild-type HD-1 gene 
(pMON9921) was too low to give a significant decrease in larval weight and hence was below a level that could be 
quantitated in this assay. In conclusion, the levels of B.tk. HD-1 protein determined by both the bioassays and the 
Western analysis for these plants containing the modified and synthetic genes agree, which demonstrates that the a 
tk. HD-1 protein produced by these plants is biologically active. 

[0092] The levels of mRNA were determined In the plants containing the wild-type B.tk. HD-1 gene (pMON9921) 
and the modified gene (pMON5370) to establish whether the increased levels of protein production result from In- 
creased transcription or translation. mRNA from plants containing the synthetic gene could not be analyzed directly 
with the same DNA probe as used for the wild-type and modified genes because of the numerous changes made In 
the coding sequence. mRNA was Isolated and hybridized with a single-stranded DNA probe homologous to approxi- 
mately the 5' 90 bp of the wild-type or modified gene coding sequences. The hybrids were digested with S1 nuclease 
and the protected probe fragments analyzed by gel electrophoresis. Because the procedure used a large excess of 
probe and long hybridization time, the amount of protected probe is proportional to the amount of B.tk. mRNA present 
In the sample. Two plants expressing the modified gene (pMON5370) were found to produce up to ten-fold more RNA 
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than a plant expressing the wild-type gene (pMON9921). 

10093] The Increased mRNA level from the modified gene Is consistent with the result expected from the modifications 
Introduced Into this gene. However, this 10 fold increase in mRNA with the modified gene compared to the wild-type 
gene is In contrast to the 100 fold Increase in B.t.k. protein from these genes In tobacco plants. If the two mRNAs were 

5 equally well translated then a 1 0 fold increase In stable mRNA would be expected to yield a 1 0 fold Increase in protein. 
The higher Increase In protein indicates that the modified gene mRNA is translated at about a 10 fold higher efficiency 
than wild-type. Thus, about half of the total effect on gene expression can be explained by changes In mRNA levels 
and about half to changes In translatlonal efficiency. This Increase In translatlonal efficiency is striking In that only about 
9.5% of the codons have been changed in the modified gene; that Is, this effect Is clearly not due to wholesale codon 

io usage changes. The Increased translational efficiency could be due to changes in mRNA secondary structure that 
affect translation or to the removal of specific translatlonal blockades due to specific codons that were changed. 
[0094] The increased expression seen with the synthetic HD-1 gene was also seen with a synthetic HD-73 gene In 
tobacco. B.tk. HD-73 was undetected In extracts of tobacco plants containing the wild-type truncated HD-73 gene 
(pMON5367), whereas B.tk. H D-73 protein was easily detected In extracts from tobacco plants containing the synthetic 

is HD-73 gene of Figure 4 (pMON5383). Approximately 1000 ng of B.tk. HD-73 protein was detected per mg of total 
soluble plant protein. 

[0095] As described in Example 3 above, the BAX HD-73 protein encoded in pMON5383 contains a small C-terminaJ 
extension of amino acids not encoded in the wild-type HD-73 protein. These extra amino acids had no effect on Insect 
toxicity or on Increased plant expression. A second synthetic HD-73 gene was constructed as described In Example 

20 3 (Figure 8) and used to transform tobacco (pMON5390). Analysis of plants containing pMON5390 showed that this 
gene was expressed at levels comparable to that of pMON5383 and that these plants had similar Insectlcldal efficacy. . 
[0096] In tobacco plants the synthetic HD-1 gene was expressed at approximately a 5-fold higher level than the 
synthetic HD-73 gene. However, this synthetic HD-73 gene still was expressed at least 1 00-fold better than the wild- 
type HD-73 gene. The HD-73 protein is approximately 5-fold more toxic to many Insect pests than the HD-1 protein, 

25 so both synthetic HD-1 and HD-73 genes provide approximately comparable insectlcidal efficacy In tobacco. 

[0097] The full length B.tk. HD-73 genes described In Example 3 were also incorporated into the plant transformation 
vector pMON893 so that they were expressed from the En 35S promoter. The synthetlc/wlld-type full length HD-73 
gene of Figure 9 was Incorporated Into pMON893 to create pMON10505. The synthetic/modified full length HD-73 
gene of Figure 10 was incorporated Into pMON893 to create pMON 10526. The fully synthetic HD-73 gene of Figure 

so 1 1 was Incorporated into pMON893 to create pMON1 051 8. These vectors were used to obtain transformed tobacco 
plants, and the plants were analyzed for insecticidal efficacy and for B.t.k. HD-73 protein levels by Western blot or 
ELISA Immunoassay. 

[0098] Tobacco plants containing all three of these full length B.tk. genes produced detectable B.tk. protein and 
showed 1 00% mortality of tobacco hornworm. This result Is surprising in light of previous reported attempts to express 

35 the full length B.tk. genes in transgenic plants. Vaeck et al. (1987) reported that a full length B.tk. oerf/nergene similar, 
to our HD-1 gene could not be detectably expressed In tobacco. Barton et al. (1987) reported a similar result for another 
full length gene from B.tk. HD-1 (the so called 4.5 kb gene), and further Indicated that tobacco callus containing this 
gene became necrotic, Indicating that the full length gene product was toxic to plant cells. Fischhoff et al. (1 987) reported 
that the full length B.tk. HD-1 gene In tomato was poorly expressed compared to a truncated gene, and no plants that 

40 were fully toxic to tobacco hornworm could be recovered. All three of the above reports Indicated much higher expres- 
sion levels and recovery of toxic plants If the respective B.tk. genes were truncated. Adang et al. reported that the full 
length HD-73 gene yielded a few tobacco plants with some biological activity (none were highly toxic) against hornworm 
and barely detectable B.tk. protein. It was also noted by them that the major B.tk. mRNA In these plants was a truncated 
1 .7 kb species that would not encode a functional toxin. This Indicated Improper expression of the gene In tobacco. In 

45 contrast to all of these reports, the three full length Rt.k. HD-73 genes described above all lead to relatively high levels 
of protein and high levels of Insect toxicity. 

[0099] B.t.k. protein and mRNA levels in tobacco plants are shown In Table X for these three vectors. As can be 
seen from the table, the synthetlc/wlld-type gene (pMON1 0506) produces B.tk. protein as about 0.01 % of total soluble 
protein; the synthetic/modified gene produces B.tk. as about 0.02% of total soluble protein; and the fully synthetic 

so gene produces B.t.k. as about 0.2% of total soluble protein. B.tk. mRNA was analyzed In these plants by Northern 
blot analysis using the common 5' synthetic half of the genes as a probe. As shown in Table X, the Increased protein 
levels can largely be attributed to increased mRNA levels. Compared to the truncated modified and synthetic genes, 
this could Indicate that the major contributors to Increased translational efficiency are in the 5' half of the gene while 
the 3" half of the gene contains mostly determinants of mRNA stability. The increased protein levels also Indicate that 

55 increasing the amount of the full length gene that is synthetic or modified Increases BA.k. protein levels. Compared to 
the truncated synthetic B.tk. HD-73 genes (pMON5383 or pMON5390), the fully synthetic gene (pMON10518) pro- 
duces as much or slightly more B.tk. protein demonstrating that the full length genes are capable of being expressed 
at high levels In plants. These tobacco plants with high levels of full length HD-73 protein show no evidence of abnor- 
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mallty and are fully fertile. The B.tk. protein levels In these plants also produce the expected levels of Insect toxicity 
based on feeding studies with beet armyworm or diet Incorporation assays of plant extracts with tobacco budworm. 
The B.t.k. protein detected by Western blot analysis In these tobacco plants often contains a varying amount of protein 
of about 80 kDa which is apparently a proteolytic fragment of the full length protein. The C-termlnal half of the full length 
protein Is known to be proteolytically sensitive, and similar proteolytic fragments are seen from the full length gene in 
E. coli and B.t Itself. These fragments are fully Insecticldal. The Northern analysis indicated that essentially all of the . 
mRNA from these full length genes was of the expected full length size. There Is no evidence of truncated mRNAs that 
could give rise to the 80 kDa protein fragment In addition, It is possible that the fragment Is not present In intact pjant. 
cells and is merely due to proteolysis during extraction for immunoassay. 

• Table X ■ 



Full Length B.tX HD-73 Protein and mRNA Levels In Transgenic Tobacco Plants 


Gene description 


Vector 


B.tk. protein concentration 


Relative B.t.k. mRNA level 


Synthetic/wild type 
Synthetic/modified 
Fully synthetic 


PMON10506 
pMON10526 
pMON10518 


>100 

400 

>2000 


0.5 
1 

40 



[0100] Thus, there is no serious impediment to producing high levels of B.tk. HD-73 protein In plants from synthetic 
. genes, and this is expected to be true of other full length lepldopteran active genes such as B.tX HD-1 or B.t. ento- 
mocidus. The fully synthetic B.tk. HD-1 gene of Example 3 has been assembled In plant transformation vectors such 
as pMON893. 

[0101] The fully synthetic gene in pMON10518 was also utilized in another plant vector and analyzed In tobacco 
plants. Although the CaMV35S promoter Is generally a high level constitutive promoter In most plant tissues, the ex- 
pression level of genes driven the CaMV35S promoter is low In floral tissue relative to the levels seen In leaf tissue. 
Because the economically Important targets damaged by some Insects are the floral parts or derived from floral parts 
(e.g., cotton squares and bolls, tobacco buds, tomato buds and fruit), It may be advantageous to Increase the expression 
of B.t. protein In these tissues over that obtained with the CaMV35S promoter. 

[0102] The 35S promoter of FIgwort Mosaic virus (FMV) Is analogous to the CaMV35S promoter. This promoter has 
been Isolated and engineered into a plant transformation vector analogous to pMON893. Relative to the CaMV pro- 
moter, the FMV 35S promoter Is highly expressed In the floral tissue, while still providing similar high levels of gene 
expression In other tissues such as leaf. A plant transformation vector, pMON1 0517, was constructed In which the full 
length synthetic B.tk. HD-73 gene of Figure 11 was driven by the FMV 35S promoter. This vector is identical to 
pMON1 0518 of Example 3 except that the FMV promoter Is substituted for the CaMV promoter. Tobacco plants trans- 
formed with pMON10517 and pMON1 0518 were obtained and compared for expression of the B.t.k. protein by Western 
blot or ELISA Immunoassay in leaf and floral tissue. This analysis showed that pMON10517 containing the FMV pro- 
moter expressed the full length HD-73 protein at higher levels In floral tissue than pMON 1051 8 containing the CaMV 
promoter. Expression of the full length B.tk. HD-73 protein from pMON1 0517 in leaf tissue is comparable to that seen 
with the most highly expressing plants containing pMON 10518. However, when floral tissue was analyzed, tobacco 
plants containing pMON1 051 8 that had high levels of B.tk. protein in leaf tissue did not have detectable B.tk. protein 
in the flowers. On the other hand, flowers of tobacco plants containing pMON10517 had levels of B.tk. protein nearty 
as high as the levels In leaves at approximately 0.05% of. total soluble protein. This analysis showed that the FMV 
promoter could be used to produce relatively high levels of B.tk. protein in floral tissue compared to the CaMV promoter. 



b) Tomato. 

[0103] The wild-type, modified and synthetic B.tk. HD-1 genes tested in tobacco were Introduced Into other plants 
to demonstrate the broad utility of this Invention. Transgenic tomatoes were produced which contain these three genes. 
Data show that the Increased expression observed with the modified and synthetic gene In tobacco also extends to 
tomato. Whereas the at*. HD-1 protein Is only barely detectable In plants containing the wild type HD-1 gene 
(pMON9921), B.tk. HD-1 was readily detected and the levels determined for plants containing the modified 
(pMON5370) or synthetic (pMON5377) genes. Expression levels for the plants containing the wild-type, modified and 
synthetic HD-1 genes were approximately 10, 100 and 500 ng per mg of total plant extract see Table XI below). The 
Increase In B.tk. HD-1 protein for the modified gene accounted for the majority of Increase observed; 10 fold higher 
than the plants containing the wild-type gene, compared to only an additional five-fold increase for plants containing 
the synthetic gene. Again the site-directed changes made in the modified gene are the major contributors to the in- 
creased expression of B.tk. HD-1. 
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Table XI 


B. tX HD-1 Expression In Transgenic Tomato Plants 


Gene Description 


Vector 


BAk. Protein" Concentration 


Fold Increase In BJX 






Expression . • 


Wild type 


pMON9921 


10 


" * .' 


Modified 


pMON5370 


100 


; 10 


Synthetic 


pMON5377 


500 


■ 50 



* fttfc HD-1 protein concentrations are expressed In ng/mg of total soluble plant protein. Data for plants containing the wild-type gene are estimates '■, 
from mRNA levels and protein levels determined by ELISA. 



[0104] These differences In BAX HD-1 expression were confirmed with bloassays against tobacco hornworm and 
beet armyworm. Leaves from tomato plants containing each of these genes controlled tobacco hornworm damage and 
15 produced 1 00% mortality. With beet armyworm, leaves from plants containing the wild-type HD-1 gene (pMON9921) 
showed significant damage, leaves from plants containing the modified gene (pMON5370) showed less damage and 
. leaves from plants containing the synthetic gene (pMON5377) were completely protected (see Tabje XII below). 



Table XII 



Protection of Tomato Plants from Tobacco Hornworm and Beet Armyworm 


Gene Description 


Vector 


Tobacco Hornworm Damage* 


Beet Armyworm Damage* 


None 


None 


NL 


NL 


Wild type 


pMON9921 


0 


3 


Modified 


pMON5370 


0 


1 


Synthetic 


pMON5377 


0 


- o 



* Damage was rated as shown In Table DC 



30 [0105] The generality of the synthetic gene approach was extended In tomato with a synthetic BAk. HD-73 gene. 
[0106] In tomato, extracts from plants containing the wild-type truncated HD-73 gene (pMON5367) showed no de- 
tectable HD-73 protein. Extracts from plants containing the synthetic HD-73 gene (pMON5383) showed high levels of 
BAX HD-73 protein, approximately 2000 ng per mg of plant extract protein. These data clearly demonstrate that the 
changes made In the synthetic HD-73 gene lead to dramatic Increases In the expression of the HD-73 protein In tomato 

35 as well as In tobacco 

[0107] In contrast to tobacco, the synthetic HD-73 gene In tomato Is expressed at approximately 4-fold to 5-fold 
higher levels than the synthetic HD-1 gene. Because the HD-73 protein Is about 5-fold more active than the HD-1 
protein against many insect pests Including Heliothls species, the increased expression of synthetic HD-73 compared 
to synthetic HD-1 corresponds to about a 25-fold Increased lnsecticldal efficacy in tomato. 

40 [0108] In order to determine the mechanisms Involved in the increased expression of modified and synthetic B.tX 
HD-1 genes In tomato, S1 nuclease analysis of mRNA levels from transformed tomato plants was performed. As 
indicated above, a similar analysis had been performed with tobacco plants, and this analysis showed that the modified 
gene produced up to 10-fold more mRNA than the wild-type gene. The analysis in tomato utilized a different DN A probe 
that allowed the analysis of wild-type (pMON9921), modified (pMON5370) and synthetic (pMON5377) HD-1 genes 

45 with the same probe. This probe was derived from the 5' untranslated region of the CaMV35S promoter In pMON893 
that was common to all three of these vectors (pMON9921 , pMON5370 and pMON5377). This S1 analysis Indicated 
that BAX mRNA levels from the modified gene were 3 to 5 fold higher than for the wild-type gene, and that mRNA 
levels for the synthetic gene were about 2 to 3 fold higher than for the modified gene. Three Independent transformants 
were analyzed for each gene. Compared to the fold increases in BAX HD-1 protein from these genes In tomato shown 

50 In Table XI, these mRNA Increases can explain about half of the total protein increase as was seen In tobacco for the 
wild-type and modified genes. For tomato the total mRNA Increase from wild-type to synthetic Is about 6 to 15 fold 
compared to a protein Increase of about 50 fold. This result Is similar to that seen for tobacco In comparing the wild- 
type and modified genes, and It extends to the synthetic gene as well. That is, about half of the total fold Increase in 
B.tk. protein from wild-type to modified genes can be explained by mRNA Increases and about half to enhanced 

55 translational efficiency. The same is also true In comparing the modified gene to the synthetic gene. Although there is 
an additional Increase In RNA levels, this mRNA Increase can explain only about half of the total protein Increase. 
[01 09] The full length B.tk. genes described above were also used to transform tomato plants and these plants were 
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analyzed for B.tX protein and insecticidal efficacy. The results of this analysis are shown in Table XIII. Plants containing 
the synthetic/wild-type gene (pMON10506) produce the B.tX HD-73 protein at levels of about 0:01% of their total 
. soluble protein. Plants containing the synthetic/modified gene (pMON 1 0526) produce about 0.04% a tk. protein, and 
plants containing the fully synthetic gene (pMON10518) produce about 0.2% B.tX protein. These results are very 

5 similar to the tobacco plant results for the same genes. mRNA levels estimated by Northern blot analysis In tomato 
also increase in parallel with the protein level increase. As for tobacco with these three genes, most of the protein 
Increase can be attributed to Increased mRNA with a small component of translational efficiency increase Indicated 
for the fully synthetic gene. The highest levels of full length B.tX. protein {from pMON1 051 8) are comparable to or just 
slightly lower than the highest levels observed for the truncated HD-73 genes (pMON5383 and pMON5390). Tomato 

10 plants expressing these full length genes have the Insecticidal activity expected for the observed protein levels as 
determined by feeding assays with beet armyworm or by diet Incorporation of plant extracts with tobacco hornworm. 



Table XIII 



15 



Full Length B.tX HD-73 Protein and mRNA Levels In Transgenic Tomato Plants 


Gene description 


Vector 


B.tX protein concentration 


Relative B.tk. mRNA level 


Synthetic/Wild type 


pMON10506 


100 


■ 1 j 


Synthetic/modified 


PMON10526 


400 


2-4 


Fully synthetic 


PMON10518 


2000 


10 ! 



c) Cotton. 

[0110] The generality of the Increased expression of af.fc HD-1 and at*. HD-73 by use of the modified and synthetic 
genes was extended to cotton. Transgenic call! were produced which contain the wild type (pMON9921 ) and the syn- 
thetic HD-1 (pMON5377) genes. Here again the B.tX HD-1 protein produced from calll containing the wild-type gene 
was not detected, whereas calli containing the synthetic HD-1 gene expressed the HD-1 protein at easily detectable 
levels. The HD-1 protein was produced at approximately 1000 ng/mg of plant calll extract protein. Again, to ensure 
, that the protein produced by the transgenic cotton calll was biologically active and that the increased expression ob- 
• served with the synthetic gene translated to increased biological activity, extracts of cotton calll were made in similar 
manner as described for tobacco plants, except that the calll was first dried between Whatman filter paper to remove 
as much of the water as possible. The dried calll were then ground In liquid nitrogen and ground In 100 mM sodium 
carbonate buffer, pH 10. Approximately 0.5 ml allquotes of this material was applied to tomato leaves with a paint 
brush. After the leaf dried, five tobacco hornworm larvae were applied to each of two leaf samples. Leaves painted 
with extract from control calli were completely destroyed. Leaves painted with extract from calll containing the wild- 
type HD-1 gene (pMON9921 ) showed severe damage. Leaves painted with extract from calll containing the synthetic 
HD-1 gene (pMON5377) showed no damage (see Table XIV below). 



Table XIV 
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Protection against Tobacco Hornworm by Tomato Leaves Painted with Extracts Prepared from Cotton CaJH 


Containing a Control, the Wild-Type B.tX. HD-1 Gene. Synthetic HD-1 Gene or Synthetic HD-73 Gene 


Gene Description 


Vector 


Tobacco Hornworm Damage* 


Control 


Control 


NL 


Wild type HD-1 


pMON9921 


3 


Synthetic HD-1 


pMON5377 


0 


Synthetic HD-73 


pMON5383 


0 



* Damage was rated as shown In Table IX. 



[01 11] Cotton call! were also produced containing another synthetic gene, a gene encoding B.tX HD-73. The prep- 
aration of this gene is described In Example 3. Calli containing the synthetic HD-73 gene produced the corresponding 
HD-73 protein at even higher levels than the calll which contained the synthetic HD-1 gene. Extracts made from calll 
containing the HD-73 synthetic gene (pMON5383) showed complete control of tobacco hornworm when painted onto 
tomato leaves as described above for extracts containing the HD-1 protein. (See Table XIV). 
[0112] Transgenic cotton plants containing the synthetic B.tX HD-1 gene (pMON5377) or the synthetic at*. HD- 
73 gene (pMON5383) have also been examined. These plants produce the HD-1 or HD-73 proteins at levels compa- 
rable to that seen In cotton callus with the same genes and comparable to tomato and tobacco plants with these genes. 
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For either synthetic truncated HD-1 or HD-73 genes, cotton plants expressing 8.f.fc protein at 1000 to 2000 ng/mg 
total protein (0.1% to 0.2%) were recovered at a high frequency. Insect feeding assays were performed with leaves 
from cotton plants expressing the synthetic HD-1 or HD-73 genes. These leaves showed no damage (rating of 0) when 
challenged with larvae of cabbage looper (Trlchoplusla ni), and only slight damage when challenged with larvae of 
5 beet armyworm (Spodoptera exlgua). Damage ratings are as defined In Table IX above. This demonstrated that cotton 
plants as well as call! expressed the synthetic HD-1 or HD-73 genes at high levels and that those plants were protected 
from damage by Lepidopte ran Insect larvae. 

[0113] Transgenic cotton plants containing either the synthetic truncated HD-1 gene (pMON5377) or the synthetic 
truncated HD-73 gene (pMON5383) were also assessed for protection against cotton bollworm at the whole plant level 

10 In the greenhouse. This Is a more realistic test of the ability of these plants to produce an agriculturally acceptable level 
of control. The cotton bollworm (Heliothls zea) Is a major pest of cotton that produces economic damage by destroying 
terminals, squares and bolls, and protection of these fruiting bodies as well as the leaf tissue will be Important for 
effective Insect control and adequate crop protection. To test the protection afforded to whole plants, R1 progeny of 
cotton plants expressing high levels of either B.tk. HD-1 (pMON5377) or B.tk. HD-73 (pMON5383) were assayed by 

is applying 10-1 5 eggs of cotton bollworm per boll or square to the 20 uppermost squares or bolls on each plant At least 
12 plants were analyzed per treatment. The hatch rate of the eggs was approximately 70%. This corresponds to very 
high Insect pressure compared to numbers of larvae per plant seen under typical field conditions. Under these condi- 
tions 1 00% of the bolls on control cotton plants were destroyed by insect damage. For the transgenics, significant boll 
protection was observed. Plants containing pMON5377 (HD-1) had 70-75% of the bolls survive the intense pressure 

20 of this assay. Plants containing pMON5383 (HD-73) had 80% to 90% boll protection. This Is likely to be a consequence 
of the higher activity of HD-73 protein against cotton bollworm compared to HD-1 protein. In cases where the transgenic 
plants were damaged by the insects, the surviving larvae were delayed in their development by at least one instar. ; 
[01 1 4] Therefore, the increased expression obtained with the modified and synthetic genes Is not limited to any one 
crop; tobacco, tomato and cotton call! and cotton plants all showed drastic Increases In B.tk. expression when the 

25 plants/calli were produced containing the modified or synthetic genes. Likewise, the utility of changes made to produce 
the modified and synthetic B.tk. HD-1 gene Is not limited to the HD-1 gene. The synthetic HD-73 gene In all three 
species also showed drastic Increases In expression. 

[0115] In summary, It has been demonstrated that: (1) the genetic changes made in the HD-1 modified gene lead to 
very significant increases in B.tk. HD-1 expression; (2) production of a totally synthetic gene lead to a further five-fold 

30 increase In B.tk. HD-1 expression; (3) the changes incorporated Into the modified HD-1 gene accounted for the majority 
of the increased B.tk. expression observed with the synthetic gene; (4) the Increased expression was demonstrated 
In three different plants - tobacco plants, tomato plants and cotton call! and cotton plants; (5) the Increased expression 
as observed by Western analysis also correlated with similar increases in bioactivity, showing mat the B.tk. HD-1 
proteins produced were comparably active; (6) when the method of the present invention used to design the synthetic 

35 HD-1 gene was employed to design a synthetic HD-73 gene It also was expressed at much higher levels in tobacco, . 
tomato and cotton than the wild-type equivalent gene with consequent increases in bioactivity; (7) a fully synthetic full 
length B.tk. gene was expressed at levels comparable to synthetic truncated genes. 

Example 5 -- Synthetic BA. tenebrionis Gene in Tobacco. Tomato and Potato 

40 

[011 6] Referring to Figure 12, a synthetic gene encoding a Coleopteran active toxin Is prepared by making the Indi- .. 
cated changes in the wild-type gene of B.t tenebrionis or de novo synthesis of the synthetic structural gene. The 
synthetic gene is inserted into an intermediate plant transformatbn vector such as pMON893: Plasmld pMON893 
containing the synthetic B.f.f. gene is then Inserted Into a suitable disarmed Agrobacterium stia\n such as A tumefacfens 
45 ACO. 

Transformation and Regeneration of Potato 

[01 17] Sterile shoot cultures of Russet Burbank are maintained In vials containing 1 0 ml of PM medium (Murashlge 
so and Skoog (MS) Inorganic salts, 30 g/l surcose, 0.17 g/l NaH 2 P0 4 H 2 0, 0.4 mg/l thlamlne-HCI, and 100 mg/l myo- 
inositol, solidified with 1 g/l Gelrite at pH 6.0). When shoots reached approximately 5 cm In length, stem Intemode 
segments of 7-1 0 mm are excised and smeared at the cut ends with a disarmed Agrobacterium tumefaciens vector 
containing the synthetic B.t.t gene from a four day old plate culture. The stem explants are co-cultured for three days 
at 23°c on a sterile filter paper placed over 1.5 ml of a tobacco cell feeder layer overlaid on 1/10 P medium (1/10 
55 strength MS inorganic salts and organic addenda without casein as In Jarret et al. (1 980), 30 g/l surcose and 8.0 g/l 
agar). Following co-culture the explants are transferred to full strength P-1 medium for callus Induction, composed of 
MS inorganic salts, organic additions as in Jarret et al. (1980) with the exception of casein, 3.0 mg/l benzyladenlne 
(BA), and 0.01 mg/l naphthaleneacetic acid (NAA) (Jarret, et al., 1980). Carbenlcillin (500 mg/l) Is included to Inhibit 
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bacterial growth, and 100 mg/l kanamycin Is added to select for transformed cells. After four weeks the explants are 
transferred to medium of the same composition but with 0.3 mg/l glbberellic acid (GA3) replacing the BA and NAA 
. (Jarret et al., 1981) to promote shoot formation. Shoots begin to develop approximately two weeks after transfer to 
shoot Induction medium; these are excised and transferred to vials of PM medium for rooting. Shoots are tested for 
5 kanamycin resistance conferred by the enzyme neomycin phosphotransferase II, by placing a section of the stem onto 
callus induction medium containing MS organic and Inorganic salts, 30 g/l surcrose, 2.25 mg/l BA, 0.186 mg/I NAA, . 
10 mg/l GA3 (Webb, et al., 1983) and 200 mg/l kanamycin to select for transformed cells. 

[0118] The synthetic BAA. gene described in figure 12, was placed Into a plant expression vector as desclbed in 
example 5. The plasmld has the following characteristics; a synthetic Bglil fragment having approximately 1800 base 

io pairs was Inserted into pMON893 in such a manner that the enhanced 35S promoter would express the B.tt gene. 
This construct, pMON 1 982, was used to transform both tobacco and tomato. Tobacco plants, selected as kanamycin 
resistant plants were screened with rabbit anti-S.f. t. antibody. C ross- re active material was detected at levels predicted 
to be suitable to cause mortality to CPB. These target Insects will not feed on tobacco, but the transgenic tobacco 
plants do demonstrate that the synthetic gene does Improve expression of this protein to detectable levels. 

is [01 1 9] Tomato plants with the pMON 1 982 construct were determined to produce BA. t protein at levels Insecticldal '. 
to CPB. In Initial studies, the leaves of four plants (5190, 5225, 5328 and 5133) showed little or no damage when 
exposed to CPB larvae (damage rating of 0-1 on a scale of 0 to 4 with 4 as no leaf remaining). Under these conditions 
the control leaves were completely eaten. Immunological analysis of these plants confirmed the presence of material 
cross-reactive with antl-B.f.f. antibody. Levels of protein expression in these plants were estimated at aproxlmately 1 

20 to 5 ng of BAA. protein In 50 ug of total extractable protein. A total of 17 tomato plants (17 of 65 tested) have been 
identified which demonstrate protection of leaf tissue from CPB (rating of 0 or 1 ) and show good Insect mortality. 
[0120] Resu Its similar to those seen In tobacco and tomato with pMON 1982 were seen with pMON 1984 In the same 
plant species. pMON1 984 is identical to pMON1982 except that the synthetic protease inhibitor (CMTI) Is fused up- 
stream of the native proteolytic cleavage site. Levels of expression in tobacco were estimated to be similar to 
. 25 pMON 1982, between 10-15 ng per 50ug of total soluble protein. 

[0121] Tomato plants expressing pMON1 984 have been Identified which protect the leaves from Ingestion by CPB. 
The damage rating was 0 with 100% Insect mortality. 

[0122] Potato was transformed as described in example 5 with a vector similar to pMON 1982 containing the enhanced 
CaMV35S/synthetic BAA. gene. Leaves of potato plants transformed with this vector, were screened by CPB insect 

so . bloassay. Of the 35 plants tested, leaves from 4 plants, 16a, 1 3c, 1 3d, and 23a were totally protected when challenged. 
Insect bloassays with leaves from three other plants, 1 3e, la, and 13b, recorded damage levels of 1 on a scale of 0 to 
4 with 4 being total devestation of the leaf material. Immunological analysis confirmed the presence of B.LL cross- 
reactive material In the leaf tissue. The level of B. f.f. protein in leaf tissue of plant 1 6a (damage rating of 0) was estimated 
at 20-50 ng of BAA. protein/50 ug of total soluble protein. The levels of BAA. protein seen in 16a tissue was consistent 

35 with its biological activity. Immunological analysis of 13e and 1 3b (tissue which scored 1 1n damage rating) reveal less 
protein (5-10 ng/50 ug of total soluble protein) than in plant 1 6a. Cuttings of plant 16a were challenged with 50 to 200 
eggs of CPB In a whole plant assay. Under these conditions 16a showed no damage and 100% mortality of Insects 
while control potato plants were heavily damaged. 

40 Example 6 - Synthetic B.tk. P2 Protein Gene 

[0123] The P2 protein is a distinct Insecticldal protein produced by some strains of BA. including B.tk. HD-1. It Is 
characterized by its activity against both lepldopteran and dipteran Insects (Yamamoto and lizuka, 1983). Genes en- 
• coding the P2 protein have been isolated and characterized (Donovan et al., 1 988). The P2 proteins encoded by these 

45 genes are approximately 600 amino acids in length. These proteins share only limited homology with the lepidopteran 
specific P1 type proteins, such as the BAM. HD-1 and HD-73 proteins described In previous examples. 
[0124] The P2 proteins have substantial activity against a variety of lepidopteran larvae Including cabbage looper, 
tobacco hornworm and tobacco budworm. Because they are active against agronomically Important Insect pests, the 
P2 proteins are a desirable candidate in the production of Insect tolerant transgenic plants either alone or In combination 

so with the other B.L toxins described In the above examples. In some plants, expression of the P2 protein alone might . 
be sufficient to provide protection against damaging insects. In addition, the P2 proteins might provide protection against 
agronomically Important dipteran pests. In other cases, expression of P2 together with the B.tk. HD-1 or HD-73 protein 
might be preferred. The P2 proteins should provide at least an additive level of insecticldal activity when combined 
with the crystal protein toxin of B.tk. HD-1 or HD-73, and the combination may even provide a synergistic activity. 

55 . Although the mode of action of the P2 protein Is unknown, Its distinct amino acid sequence suggests that It functions 
differently from the B.tk. HD-1 and HD-73 type of proteins. Production of two Insect tolerance proteins with different 
modes of action In the same plant would minimize the potential for development of Insect resistance to BA. proteins In 
plants. The lack of substantial DNA homology between P2 genes and the HD-1 and HD-73 genes minimizes the po- 
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tential for recombination between multiple Insect tolerance genes In the plant chromosome. 
[0125] The genes encoding the P2 protein although distinct In sequence from the BAX HD-1 and HD-73 genes share 
many common features with these genes. In particular, the P2 protein genes have a high A+T content (65%), multiple 
potential polyadenylatlon signal sequences (26) and numerous ATTTA sequences (1 0). Because of Its overall similarity 
5 to the poorly expressed wild-type B.tk. HD-1 and HD-73 genes, the same problems are expected In expression of the 
wild-type P2 gene as were encountered with the previous examples. Based on the above-described method for de- : 
signing the synthetic B.t genes, a synthetic P2 gene has been designed which gene should be expressed at adequate 
levels for protection In plants. A comparislon of the wild-type and synthetic P2 genes Is shown in Figure 13. 

10 Example 7 - Synthetic B.t. Entomocldus Gene 

[0126] The BA. entomocidus ("Btent") protein Is a distinct insecticidal protein produced by some strains of B.t. bac- 
teria. It is characterized by its high level of activity against some lepidopterans that are relatively insensitive to B.ifJt 
HD-1 and HD-73 such as Spodoptera species Including beet armyworm (vlsser et al., 1988), Genes encoding the 

is Btent protein have been isolated and characterized (Honee et al, 1 988). The Btent proteins encoded by these genes 
are approximately the same length as BAX HD-1 and HD-73. These proteins share only 68% amino acid homology . 
with the BAX HD-1 and HD-73 proteins. It is likely that only the N-terminal half of the Btent protein is required for 
insecticidal activity as is the case for HD-1 and HD-73. Over the first 625 amino acids, Btent shares only 38% amino 
acid homology with HD-1 and HD-73. 

zo [0127] Because of their higher activity against Spodoptera species that are relatively insensitive to HD-1 and HD- 
73, the Btent proteins are a desirable candidate for the production of Insect tolerant transgenic plants either alone or .' 
In combination with the other BA. toxins described In the above examples. In some plants production of Btent alone 
might be sufficient to control the-agronomically Important pests. In other plants, the production of two distinct Insect 
tolerance proteins would provide protection against a wider array of insects. Against those Insects where both proteins 

25 are active, the combination of the BAX HD-1 or HD-73 type protein plus the Btent protein should provide at least 
additive Insecticidal efficacy, and may even provide a synergistic activity. In addition, because of Its distinct amino acid 
sequence, the Btent protein may have a different mode of action than HD-1 or HD-73. Production of two Insecticidal . 
proteins In the same plant with different modes of action would minimize the potential for development of insect resist- 
ance to B.f. proteins in plants. The relative lack of DNA sequence homology with the BAX type genes minimizes the 

30 potential for recombination between multiple Insect tolerance genes in the plant chromosome. 

[0128] The genes encoding the Btent protein although distinct in sequence from the BAX HD-1 and HD-73 genes 
share many common features with these genes. In particular, the Btent protein genes have a high A+T content (62%), 
multiple potential polyadenylatlon signal sequences (39 in the full length coding sequence and 27 in the first 1875 
■ nucleotides that is likely to encode the active toxic fragment) and numerous ATTTA sequences (16 In the full length 

35 coding sequence and 12 In the first 1875 nucleotides). Because of its overall similarity to the poorly expressed wild . 
type BAX HD-1 and HD-73 genes, the wild-type Btent genes are expected to exhibit similar problems In expression 
as were encountered with the wild-type HD-1 and HD-73 genes. Based on the above-described method used for 
designing the other synthetic BA. genes, a synthetic Btent gene has been designed which gene should be expressed 
at adequate levels for protection in plants. A comparlsion of the wild type and synthetic Btent genes is shown In Figure 

40 U. 

Example 8 - Synthetic BAX. Genes for Expression in Corn 

[0129] High level expression of heterologous genes In corn cells has been shown to be enhanced by the presence 
45 of a corn gene Intron (Callls et al., 1 987). Typically these introns have been located in the 5' untranslated region of the 
chimeric gene. It has been shown that the CaM V35S promoter and the NOS 3' end function efficiently in the expression 
of heterologous genes in corn cells (Fromm et al., 1986). 

[0130] Referring to Figure 1 5, a plant expression cassette vector (pMON744) was constructed that contains these 
sequences. Specifically the expression cassette contains the enhanced CaMV 35S promoter followed by Intron 1 of 

so the corn Adhl gene (Callls et al., 1 987). This Is followed by a multillnker cloning site for Insertion of coding sequences; 
this multillnker contains a Bglll site among others. Following the multilinker Is the NOS 3' end. pMON744 also contains 
the selectable marker gene 35S/NPTII/NOS 3' for kanamycln selection of transgenic com cells. In addition, pMON744 
has an E. coli origin of replication and an ampicillln resistance gene for selection of the plasmid in E. coff. 
[0131] Five BAX coding sequences described in the previous examples were Inserted Into the Bglll site of pMON744 

55 for corn cell expression of BAX The coding sequences Inserted and resulting vectors were: 

1. Wild type BAX HD-1 from pMON9921 to make pMON8652. 

2. Modified BAX HD-1 from pMON5370 to make pMON8642. 
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3. Synthetic B.tX HD-1 from pMON5377 to make pMON8643. 

4. Synthetic B.Ik. HD-73 from pMON5390 to make pMON8644. 

5. Synthetic full length B.tX HD-73 from pMON10518 to make pMON10902. 

5 [0132] pMON8652 (wild-type BAX HD-1 ) was used to transform corn cell protoplasts and stably transformed kan- 
amycin resistant callus was Isolated. B.tX mRNA In the corn cells was analyzed by nuclease S1 protection and found . 
to be present at a level comparable to that seen with the same wild-type coding sequence (pMON9921 ) In transgenic 
tomato plants. 

[0133] pMON8652 and pMON8642 (modified HD-1) were used to transform corn cell protoplasts In a transient ex- 
io pression system. The level of B.tX mRNA was analyzed by nuclease S1 protection. The modified HD-1 gave rise to 
\ a several fold Increase In B.tX mRNA compared to the wild-type coding sequence In the transiently transformed corn 
cells. This Indicated that the modifications introduced Into the B.tX HD-1 gene are capable of enhancing B.tX expres- 
sion In monocot cells as was demonstrated for dicot plants and cells. 

[01 34] pMON8642 (modified HD-1 ) and pMON8643 (synthetic HD-1 ) were used to transform Black Mexican Sweet 
15 (BMS) corn cell protoplasts by PEG-mediated DNA uptake, and stably transformed corn callus was selected by growth 
on kanamycln containing plant growth medium. Individual callus colonies that were derived from single transformed 
cells were isolated and propagated separately on kanamycln containing medium. 

[0135] To assess the expression of the B.tX genes in these cells, callus samples were tested for Insect toxicity by 
. bloassay against tobacco hornworm larvae. For each vector, 96 callus lines were tested by bioassay. Portions of each 

20. callus were placed on sterile water agar plates, and five neonate tobacco hornworm larvae were added and allowed 
to feed for 4 days. For pMON8643, 100% of the larvae died after feeding on 15 of the 96 calll and these calll showed 
little feeding damage. For pMON8642, only 1 of the 96 calll was toxic to the larvae. This showed that the BXk. gene 
was being expressed In these samples at insecticidal levels. The observation that significantly more calll containing 
pMON8643 were toxic than for pMON8642 showed that significantly higher levels of expression were obtained when 

25 the synthetic HD-1 coding sequence was contained In corn cells than when the modified HD-1 coding sequence was 
used, similar to the previous examples with dlcot plants. A semiquantitative Immunoassay showed that the pMON8643 
toxic samples had significantly higher B.tX protein levels than the pMON8642 toxic sample. 
[0136] The 16 callus samples that were toxic to tobacco hornworm were also tested for activity against European 
corn borer. European corn borer is approximately 40-fold less sensitive to the HD-1 gene product than is tobacco 

so hornworm. Larvae of European corn borer were applied to the cailus samples and allowed to feed for 4 days. Two of 
the 16 call! tested, both of which contained pMON8643 (synthetic HD-1 ), were toxic to European com borer larvae. 
[0137] To assess the expression of the B.tX genes In differentiated corn tissue, another method of DNA delivery 
was used. Young leaves were excised from corn plants, and DNA samples were delivered into the leaf tissue by ml- 
croprojectile bombardment. In this system, the DNA on the mlcroprojectlles is transiently expressed in the leaf cells 

35 after bombardment. Three DNA samples were used, and each DNA was tested In triplicate. . 

. 1. pMON744, the corn expression vector with no B. tX gene. 

2. pMON8643 (synthetic HD-1). 

3. pMON752, a corn expression vector for the GUS gene, no B.tX gene. 

40 

[0138] The leaves were incubated at room temperature for 24 hours. The pMON752 samples were stained with a 
substrate that allows visual detection of the GUS gene product This analysis showed that over one hundred spots In 
each sample were expressing the GUS product and the the triplicate samples showed very similar levels of GUS 
expression. For the pMON744 and pMON8643 samples 5 larvae of tobacco hornworm were added to each leaf and 
45 allowed to feed for 48 hours. All three samples bombarded with pMON744 showed extensive feeding damage and no 
larval mortality. All three samples bombarded with pMON8643 showed no evidence of feeding damage and 100% 
larval mortality. The samples were also assayed for the presence of B.tX protein by a qualitative Immunoassay. All of 
the pMON8643 samples had detectable B.tk. protein. These results demonstrated that the the synthetic B.tk. gene 
was expressed In differentiated corn plant tissue at Insecticidal levels. 

so 

Example 9 -- Expression of Synthetic B.f. Genes with RUBISCO Small Subunlt Promoters and Chloroplast Transit 
Peptides 

[0139] The genes In plants encoding the small subunlt of RUBISCO (SSU) are often highly expressed, fight regulated 
55 and sometimes show tissue specificity. These expression properties are largely due to the promoter sequences of 
these genes. It has been possible to use SSU promoters to express heterologous genes In transformed plants. Typically 
a plant will contain multiple SSU genes, and the expression levels and tissue specificity of different SSU genes will be 
different. The SSU proteins are encoded in the nucleus and synthesized in the cytoplasm as precursors that contain 
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an N-termlnal extension known as the chloroplast transit peptide (CTP). The CTP directs the precursor to the chloroplast 
and promotes the uptake of the SSU protein into the chloroplast In this process, the CTP is cleaved from the SSU 
protein. These CTP sequences have been used to direct heterologous proteins into chloroplasts of transformed plants. 
[0140] The SSU promoters might have several advantages for expression of BAX. genes In plants. Some SSU pro- 

5 moters are very highly expressed and could give rise to expression levels as high or higher than those observed with 
the CaMV35S promoter. The tissue distribution of expression from SSU promoters is different from that of the CaMV35S 
promoter, so for control of some insect pests, it may be advantageous to direct the expression of BAX to those cells 
In which SSU is most highly expressed. For example, although relatively constitutive, in the leaf the CaMV35S promoter 
is more highly expressed in vascular tissue than in some other parts of the leaf, while most SSU promoters are most 

10 . highly expressed In the mesophyll cells of the leaf. Some SSU promoters also are more highly tissue specific, so it 
could be possible to utilize a specific SSU promoter to express B.tk. in only a subset of plant tissues, if for example . : 
B.t. expression in certain cells was found to be deleterious to those cells. For example, for control of Colorado potato 
beetle In potato, It may be advantageous to use SSU promoters to direct B.tt expression to the leaves but not to the 
edible tubers. 

is [0141] Utilizing SSU CTP sequences to localize BA. proteins to the chloroplast might also be advantageous. Local- 
ization of the Rr. to the chloroplast could protect the protein from proteases found In the cytoplasm. This could stabilize 
the BA. protein and lead to higher levels of accumulation of active protein. BA. genes containing the CTP could be used 
in combination with the SSU promoter or with other promoters such as CaMV35S. . 

[0142] A variety of plant transformation vectors were constructed for the expression of BAX genes utilizing SSU 

20 promoters and SSU CTPs.The promoters andCTPs utilized were from the petunia SSU1 1a gene described by Turner 
et al. (1986) and from the Arabldopsls atsIA gene (an SSU gene) described by Krebbers et al. (1988) and by Ellonor 
et al. (1989). The petunia SSU11a promoter was contained on a DNA fragment that extended approximately 800 bp 
upstream of the SSU coding sequence. The Arabidopsis ats1 A promoter was contained on a DNA fragment that ex- 
tended approximately 1.8 kb upstream of the SSU coding sequence. At the upstream end convenient sites from the 

25 multllinker of pUC1 8 were used to move these promoters Into plant transformation vectors such as pMON893. These 
promoter fragments extended to the start of the SSU coding sequence at which point an Ncol restriction site was 
engineered to allow Insertion of the BA. coding sequence, replacing the SSU coding sequence. 
[0143] When SSU promoters were used In combination with their CTP, the DNA fragments extended through the 
coding sequence of the CTP and a small portion of the mature SSU coding sequence at which point an Ncol restriction 

30 site was engineered by standard techniques to albw the in frame fusion of BA. coding sequences with the CTP. In 
particular, for the petunia SSU11a CTP, BA. coding sequences were fused to the SSU sequence after amino acid 8 of 
the mature SSU sequence at which point the Ncol site was placed. The 8 amino acids of mature SSU sequence were 
included because preliminary in vitro chloroplast uptake experiments Indicated that uptake was of B.tk. was observed 
only if this segment of mature SSU was included. For the Arabidopsis ats1 A CTP, the complete CTP was Included plus 

35 24 amino acids of mature SSU sequence plus the sequence gly-gly-arg-val-asn-cys-met-gln-ala-met, terminating In 
an Ncol site for BA. fusion. This short sequence reiterates the native SSU CTP cleavage site (between the cys and 
met) plus a short segment surrounding the cleavage site. This. sequence was Included In order to insure proper uptake 
Into chloroplasts. BA. coding sequences were fused to this atsIA CTP after the met codon. In vitro uptake experiments 
with this CTP construction and other (non-Rf.) coding sequences showed that this CTP did target proteins to the 

40 chloroplast 

[0144] When CTPs were used in combination with the CaMV 35S promoter, the same CTP segments were used. 
They were excised just upstream of the ATG start sites of the CTP by engineering of Bglll sites, and placed downstream 
of the CaMV35S promoter In pMON893, as Bglll to Ncol fragments. B.L coding sequences were fused as described 
above. 

45 [0145] The wild type B.t.k. HD-1 coding sequence of pMON9921 (see Figure 1 ) was fused to the ats1 A promoter to 
make pMON1925orthe ats1 A promoter plus CTP to make pMON 1921. These vectors were used to transform tobacco 
plants, and the plants were screened for activity against tobacco hornworm. No toxic plants were recovered. This Is 
surprising In light of the fact that toxic plants could be recovered, albeit at a low frequency, after transformation with 
pMON9921 In which the BAX coding sequence was expressed from the enhanced CaMV35S, promoter In pMON893, 

so and In light of the fact that Ellonor et al. (1989) report that the atsIA promoter Itself Is comparable In strength to the 
CaMV35S promoter and approximately 1 0-fold stronger when the CTP sequence Is included. At least for the wild-type 
BAX HD-1 coding sequence, this does not appear to be the case. 

[0146] A variety of plant transformation vectors were constructed utilizing either the truncated synthetic. HD-73 
coding sequence of Figure 4 or the full length BAX HD-73 coding sequence of Figure 11 . These are listed in the table 
55 below. 
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Table XV 



5 



10 



15 



Gene Constructs with CTPa . 


Vector 


Promoter 


CTP 


at/c HD-73 Coding Sequence 


piVHJIN 1UOUO 


En 35S 


atsIA 


truncated 


PMON10814 


En35S 


SSU11a 


full length 


pMON 10811 


SSU11a 


SSU11a 


truncated 


pMON10819 


SSU11a 


none 


truncated 


pMON10815 


atsIA 


none 


truncated 


pMON 10817 


atsIA 


atsIA 


truncated 


pMON 10821 


En 35S 


atsIA 


truncated 


pMON 10822 


En 35S 


atsIA 


full length 


pMON 10838 


SSU11a 


SSU11a 


full length 


pMON 10839 


. atsIA 


atsIA 


full length 



101 47] All of the above vectors were used to transform tobacco plants. For all of the vectors containing truncated a 
tfc genes, leaf tissue from these plants has been analyzed for toxicity to Insects and BAM. protein levels by immu- 

20 noassay. pMON10806, 10811, 10819 and 10821 produce levels of atfc protein comparable to pMON5383 and 
pMON5390 which contain synthetic BAM. HD-73 coding sequences driven by the En 35S promoter itself with no CTP. 
These plants also have the Insectlcldal activity expected for the BAM protein levels detected. For pMON10815 and 
pMON1 0817 (containing the atsIA promoter), the level of a tic protein Is about 5-fold higher than that found In plants 
containing pMON5383 or 5390. These plants also have higher insecticldal activity. Plants containing 1 0815 and 1 0817 

25 contain up to 1 % of their total soluble leaf protein as a tfc HD-73. This Is the highest level of B.tk. protein yet obtained 
with any of the synthetic genes. 

[0148] . This result Is surprising in two respects. First, as noted above, the wild type coding sequences fused to the 
ats1 A promoter and CTP did not show any evidence of higher levels of expression than for En 35S, and In fact.had 
lower expression based on the absence of any Insectlcidal plants. Second, Ellonor et al. (1989) show that for two other 
30 genes, the atsIA CTP can increase expression from the atsIA promoter by about.1 0-fold. For the synthetic BAM. HD- 
73 gene, there is no consistent Increase seen by Including the CTP over and above that seen for the atsIA promoter 

[0149] Tobacco plants containing the full length synthetic HD-73 fused to the SSU11 A CTP and driven by the En 
35S promoter produced levels of BAM. protein and insecticldal activity comparable to pMOM 51 8 which contains does 
35 not Include the CTP. In addition, for pMON10518 the B.tk. protein extracted from plants was observed by gel electro- 
phoresis to contain multiple forms less than full length, apparently due the cleavage of the C-termlnal portion (not 
required for toxicity) In the cytoplasm. For pMON10814, the majority of the protein appeared to be intact full length 
Indicating that the protein has been stabilized from proteolysis by targeting to the chloroplast 

40 Example 10 - Targeting of BA. Proteins to the Extracellular Space or Vacuole through the Use of Signal Peptides 

[0150] The at proteins produced from the synthetic genes described here are localized to the cytoplasm of the plant 
cell, and this cytoplasmic localization results in plants that are Insecticldally effective. It may be advantageous for some 
purposes to direct the at proteins to other compartments of the plant cell. Localizing at proteins In compartments 

45 other than the cytoplasm may result In less exposure of the at proteins to cytoplasmic proteases leading to greater 
accumulation of the protein yielding enhanced insecticldal activity. Extracellular localization could lead to more efficient 
exposure of certain insects to the at proteins leading to greater efficacy. If a BA. protein were found to be deleterious 
to plant cell function, then localization to a noncytoplasmic compartment could protect these cells from the . protein. 
[0151] . In plants as well as other eucaryotes, proteins that are destined to be localized either extracellularly or In 

50 several specific compartments are typically synthesized with an N-termlnal amino acid extension known as the signal 
peptide. This signal peptide directs the protein to enter the compartmentallzatlon pathway, and It Is typically cleaved 
from the mature protein as an early step In compartmentallzatlon. For an extracellular protein, the secretory pathway 
typically involves cotranslational insertion into the endoplasmic reticulum with cleavage of the signal peptide occuring 
at this stage. The mature protein then passes thru the Golgl body into vesicles that fuse with the plasma membrane 

55 thus releasing the protein into the extracellular space. Proteins destined for other compartments follow a similar path- 
way. For example, proteins that are destined for the endoplasmic reticulum or the Golgl body follow this scheme,' but 
they are specifically retained In the appropriate compartment In plants, some proteins are also targeted to the vacuole, 



35 



EP0385962B1 

another membrane bound compartment in the cytoplasam of many plant cells. Vacuole targeted proteins diverge from 
the above pathway at the Golgl body where they enter vesicles that fuse with the vacuole. 

[0152] A common feature of this protein targeting is the signal peptide that Initiates the compartmentallzatlon process.. 
Fusing a signal peptide to a protein will In many cases lead to the targeting of that protein to the endoplasmic reticulum. 

5 The efficiency of this step may depend on the sequence of the mature protein Itself as well. The signals that direct a 
protein to a specific compartment rather than to the extracellular space are not as clearly defined. It appears that many 
of the signals that direct the protein to specific compartments are contained within the amino acid sequence of the 
mature protein. This has been shown for some vacuole targeted proteins, but It Is not yet possible to define these 
sequences precisely. It appears that secretion Into the extracellular space is the "default" pathway for a protein that 

10 contains a signal sequence but no other compartmentalization signals. Thus, a strategy to direct at proteins out of. 
the cytoplasm is to fuse the genes for synthetic at genes to DNA sequences encoding known plant signal peptides. 
These fusion genes will give rise to at proteins that enter the secretory pathway, and lead to extraceJIualar secretion 
or targeting to the vacuole or other compartments. 

[0153] Signal sequences for several plant genes have been described. One such sequence Is for the tobacco patho- 
1S genesis related protein PR1b described by Cornelissen etal. The PR1 b protein Is normally localized to the extracellular 
space. Another type of signal peptide is contained on seed storage proteins of legumes. These proteins are localized 
to the protein body of seeds, which Is a vacuole like compartment found In seeds. A signal peptide DNA sequence for 
the beta subunit of the 7S storage protein of common bean (Phaseolus vulgaris), PvuB. has been described by Doyle 
et al. Based on the published these published sequences, genes were synthesized by chemical synthesis of oflgonu- 
20 cleotldes that encoded the signal peptides for PR1 b and PvuB. The synthetic genes for these signal peptides com* 
sponded exactly to the reported DNA sequences. Just upstream of the translation^ Inflation codon of each signal 
peptide a BamHt and Bglll site were Inserted with the BamHI site at the 5* end. This allowed the Insertion of the signal 
peptide encoding segments into the Bglll site of pMON893 for expression from the En 35S promoter. In some cases 
to achieve secretion or compartmentalization of heterologous proteins, it has proved necessary to include some amino 
25 acid sequence beyond the normal cleavage site of the signal peptide. This may be necessary to Insure proper cleavage 
of the signal peptide. For PR1b the synthetic DNA sequence also Included the first 10 amino acids of mature PR1b. 
For PvuB the synthetic DNA sequence Included the first 13 amino acids of mature PvuB. Both synthetic signal peptide 
encoding segments ended with Ncol sites to allow fusion In frame to the methionine Initiation codon of the synthetic 
B.t. genes. 

so [0154] Four vectors encoding synthetic at*. HD-73 genes were constructed containing these signal peptides. The 
synthetic truncated HD-73 gene from pMON5383 was fused with the signal peptide sequence of PvuB and Incorporated 
Into pMON893 to create pMON 10827. The synthetic truncated HD-73 gene from pMON5383 was also fused with the 
signal peptide sequence of PR1b to create pMON10824. The full length synthetic HD-73 gene from pMON10518 was 
fused with the signal peptide sequence of PvuB and incorporated into pMON893 to create pMON1 0828. The full length 

35 synthetic HD-73 gene from pMON1 0518 was also fused with the signal peptide sequence of PR1 b and incorporated 
Into pMON893 to create pMON1 0825. 

[01 55] These vectors were used to transform tobacco plants and the plants were assayed for expression of the at , 
k. protein by Western blot analysis and for Insectlcldal efficacy. pMON10824 and pMON 10827 produced amounts of 
atfc protein In leaf comparable to the truncated HD-73 vectors, pMON5383 and pMON5390. pMON10825 and 
40 pMON10828 produced full length atfc protein In amounts comparable to pMON10518. In all cases, the plants were 
insectlcldally active against tobacco hornworm. 
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Ctalms 

. 1. A method for modifying a wild-type structural gene sequence which encodes an Insectlcldal protein of Bacillus 
thuringlensis to enhance the expression of said protein In plants which comprises: 

5 

a) Identifying regions within said sequence with greater than four consecutive adenine or thymine nucleotides; 

b) modifying the regions of step (a) which have two or more polyadenylatton signals within a ten base sequence 
to remove said signals while maintaining a gene sequence which encodes said protein; and 

10 

c) modifying the 15-30 base regions surrounding the regions of step (a) to remove major plant polyadenylation 
signals, consecutive sequences containing more than one minor polyadenylation signal and consecutive se- 
quences containing more than one ATTTA sequence while maintaining a gene sequence which encodes said 
protein. 

15 

2. A method for modifying a wild-type structural gene sequence which encodes an Insectlcldal protein of Bacillus 
thuringiensls to enhance the expression of said protein In plants which comprises: 

a) removing polyadenylation signals contained in said wild-type gene while retaining a sequence which en- 
zo codes said protein; and 

b) removing ATTTA sequences contained in said wild-type gene while retaining a sequence which encodes 
said protein. 

25 3. A method of claim 2 further comprising the removal of self-complementary sequences and replacement of such 
sequences with nonself-complementary DNA comprising plant preferred codons while retaining a structural gene 
sequence encoding said protein. 

4. A method of claims 1 to 3 further comprising the use of plant preferred sequences in the removal of the poiyade- 
so nylation signals and ATTTA sequences. 

5. A method of claims 1 to 3 in which the plant polyadenylation signals are selected from the group consisting of 
AATAAA, AATAAT, AACCAA, ATATAA, AATCAA, ATACTA, ATAAAA, ATGAAA, AAGCAT, ATTAAT, ATACAT, AAAA- 
TA, ATTAAA, AATTAA, AATACA and CATAAA. 

35 

6. A method for Improving the expression of a heterologous gene In plants wherein said gene comprises a modified 
chimeric gene containing a promoter which functions In plant cells operably linked to a structural coding sequence 
and a 3' non-translated region containing a polyadenylation signal which functions In plants to cause the addition 
of polyadenylate nucleotides to the 3' end of the RNA, wherein said structural coding sequence encodes an in- 

4o . secticidal protein at least a portion of which was derived from a Bacillus thuringiensis protein, wherein said method 
comprises modifying said structural coding sequence so that said sequence has a DNA sequence which differs 
from the naturally occurring DNA sequence encoding said Bacillus thuringiensis protein and said structural coding 
sequence does not contain more than 5 consecutive nucleotides consisting of either adenine or thymine residues. 

45 7. A method for Improving the expression of a heterologous gene in plants wherein said gene comprises a modified 
chimeric gene containing a promoter which functions In plant cells operably linked to a structural coding sequence 
and a 3' non-translated region containing a polyadenylation signal which functions in plants to cause the addition 
of polyadenylate nucleotides to the 3' end of the RNA, wherein said structural coding sequence encodes an In- 
sectlcldal protein at least a portion of which was derived from a Bacillus thuringiensis protein, wherein said method 

50 comprises modifying said structural coding sequence so that said sequence has a DNA sequence which differs 

from the naturally occurring DNA sequence encoding said Bacillus thuringiensis protein and has the following 
characteristics: 



55 



said structural coding sequence has a region which is complementary to the following sequence: 
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GGCTTGATTCCTAGCGAACTCTTCGATTCTCTGGTTGATGAGCTGTTC 
1 5 10 15 20 25 30 35 40 45 

said region In said coding sequence having eliminated 2 AACCAA and 1 AATTAA sequence. 

8. A method according to claim 7, wherein said structural coding sequence encodes an Insectlddal protein at least 
a portion of which was derived from a Bacillus thuringiensis kurstakis HD-1. 

9. A method according to claim 7 or 8, wherein the plant is a tobacco plant 

10. A modified chimeric gene containing a promoter which functions In plant cells operably linked to a structural coding 
sequence and a 3' non-translated region containing a polyadenylatlon signal which functions In plants to cause 

. the addition of polyadenylate nucleotides to the 3* end of the RNA, wherein said structural coding sequence en- 
codes an tnsecticldal protein at least a portion of which was derived from a Bacillus thuringiensis protein, wherein 
said structural coding sequence has a DNA sequence which differs from the naturally occurring DNA sequence 
encoding said Bacillus thuringiensis protein and Is selected from: 

A. A structural gene which encodes an Insectlddal protein of at*. HD-1 having the sequence: 
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1 ATGGCTATAGAAACTGGTTACACCCCAATCGATATT7CCT 

4 1 TGTCGCTAACGCAATTTCTTTTGAGTGAATTTGTTCCCGG 

81 TGCTGGATTTGXG7TAGGACTAGTTGATATTATCTGGGGA 

121 ATTTTXGGTCCCTCTCAATGGGACGCATTTCT7G7ACAAA. 

1 61 XTGAACAGCTCAT CAACCAGAGAATCGAAGAGTTCGCTAG 

201 GAATCAAGCCATTTCTAGATTAGAAGGACTAAGCAATCT? 

241 TATCAAATTTACGCAGAATCTTTTAGAGAGTGGGAAGCAG 

281 ATCCT ACT AATCC AG C ATT AAG AG AAG AGATGCGT ATTCA 

321 ATTCAATGACATGAACAGTGCCCTTACAACCGC7ATTCCT 
• - •. .■■ 

3 61 CTTTTTGCAGTTCAAAATTATCAAGTTCCTCTCCrCTCCG :. 

401 TGTACGTTCAAGCTGCCAACCTCCACCTCTCAGTTTTGAG 

. • • 

441 AGA7GTTTCAG-TGTTTGGACAAAGGTGGGGATTTGATGCC 

481 GCGACTATCAATAGTCGTTATAATGAITTAACTAGGCTTA 
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521 TT GG C AACT AT AC AG ATC ATGCTG TACGCTGGT ACAATAC 560 

561 GGG ATT AGAGCGTGTATGGGGACC GG ATTCTAGAGATTGG . 600 

601 ATCAGGTACAACCAGTTCAGAAGAGAGCTTACACTAACTG 640 

641 T ATT AG ATAT C G TTTC TCTATT TCC GAACT ATG AT AGTAG 680 

681 AACGTATCCAATTCGAACAGTTTCCCAATTAACAAGAGAA 720 

■ 721 ATTTATACAAACCCAGTATTAGAAAATTTTGATGGTAGTT 760 

761 T TCG AGGCTC GGCTC AGGGC AT AG AAGGAAGT ATTAGGAG 800 

801 TC C AC ATTT GAT G G AT AT ACTT AAT AGT AT AACC ATCTAT 840 

841 ACGGATGCTCATAGAGGAGAATACTACTGGTCCGGTCACC 880 

881 AGATCATGGCTTCTCCTGTAGGGTTTTCGGGGCCAGAATT 920 

921 CACTTTTCCGCTATATGGAACTATGGGAAATGCAGCTCCA . . 960 

961 C AAC AACG TATTG TTG CTCAACTAGGTC AG GGCGTGTATA 1000 

1001 G AAC ATTATCGTC C ACCTT AT AT AG AAGACCTTTTAACAT . 1040 

• . ■ • • 

1041 CGGGATCAACAACCAACAACTATCTGTTCTTGACGGGACA 1080 

1081 GAATTTGCTTATGGAACCTCCTCAAATTTGCCATCCGCTG 1120 
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• • • ■ ■ 

.1121 TATACAGAAAAAGCGGAACGGTAGATTCGCTGGATGAAAT 1160 

• ■•.•»■ • . 

1161 ACC GCC AC AG AAT AAC AACGTGCC ACCTAG GCAAGG ATTT 1200 

• • ■ • - ■ • 

1201 AGT CATC GATT AAG CCATGTTTCAATGTTTCGTTCAGGCT 1240 

1241 TTAGTAA TAGTAG TGTAAG T ATAATAAGAG CTCCTATGTT .1280. 

• • . • •'" • • 

1281 CT CTT G G ATAC AT C GT AGTGC TG AGTTCAAC AAC ATCATC 1320 

1321 CCTTC ATC AC AAAT C ACCC AAAT CCC ACTC ACC AAG TCTA 1360 . 

1361 CTAATCTTGGCTCTGGAACTTCTGTCGTTAAAGGACCAGG 1400 

1401 ATTTACAGGAGGAGATATTCTTCGAAGAACTTCACCTGGC 1440 

1441 C AG ATTT C AAC CT T AAG AGTAAAT ATT ACT G C AC CATTAT 1480 

1481 CACAAAGATATCGGGTAAGAATTCGCTACGCTTCTACCAC . 1520 

• ' * • • 

1521 AAACCTT C AGTTC C AC AC AT C AAT TG ACGG AAGACCTATT 1560 

,."-» • • • 

1561 AATC AGGGG AATTTTTC AG C AACT ATGAGT AGTGGG AGTA 1600 

1601 ATTTACAGTCCGGAAGCTTTAGGACTGTAGGTTTTACTAC 1640 

1641 TCCGTTTAACTTTTCAAATGGATCAAGTGTATTTACGTTA 1680 

• • • • 

1681 AGTGCTCATGTCTTCAATTCAGGCAATGAAGTTTATATAG 1720 

1721 ATCGAATTGAATTTGTTCCGGCA 1743. 
B. A structural gene which encodes an Insectlcldal protein of atfc HD-73 having the sequence: 
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1 ATGGCCATTGAAACCGGTTACACTCCCATCGACATCTCCT 40 . 

41 TGTCCTTGACAC AGTTTCTGCTCAGCGAGTTCGTGCCAGG 80.. 

81 TGCTGGGTTCGTTCTCGGACTAGTTGACATCATCTGGGGT 120 

121 . ATCTTTGGTCCATCTCAATGGGATGCATTCCTGGTGCAAA 160. 

161 TTGAGCAGTTGATCAACCAGAGGATCGAAGAGTTCGCCAG 200 

201 GAACCAGGCCATCTC7AGGTTGGAAGGATTGAGCAATCTC 240 

■ • • • 

241 TACCAAATCTATGCAGAGAGCTTCAGAGAGTGGGAAGCCG 280 

281 ATCCTACTAACCCAGCTCTCCGCGAGGAAATGCGTATTCA 320. 

321 ATTCAACGACATGAACAGCGCCTTGACCACAGCTATCCCA '/.-360 

361 . TTGTTCGCAGTCCAGAACTACCAAGTTCCTCTCTTGTCCG 400 

• ■ ..... * 

401 TGTACGTTCAAGCAGCTAATCTTCACCTCAGCGTGCTTCG 440 
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441 AGACGTTAGCGTGTTTGGGCAAAGGTGGGGATTCGATGCT 480 

• • ■. -*' ' , • ■ 

481 GCA&CCATCAATAGCCGTTACAACGACCTTACTAGGCTGA 520 

• ■ . " • • • ■ .. • 1 

521 TTGGAAACTACACCGACCACGCTGTTCGTTGGTACAACAC . 560 

• . •• 

561 TGGCTTGGAGCGTGTCTGGGGTCCTGATTCTAGAGATTGG. 600 . 

• • •' :• . . • 

601 ' ATTAGATACAACCAGTTCAGGAGAGAATTGACCCTCACAG 640 

641 TTTTGGACATTGTGTCTCTCTTCCCGAACTATGACTCCAG : 680 

681 AACCTACCCTATCCGTACAGTGTCCCAACTTACCAGAGAA 720 

• # . • • •'• 

721 ATCTATACTAACCCAGTTCTTGAGAACTTCGACGGTAGCT 760 

■ • •.*••.«'■ 

761 TCCGTGGTTCTGCCCAAGGTATCGAAGGCTCCATCAGGAG 800 

801 CCCACACTTGATGGACATCTTGAACAGCATAACTATCTAC 840 

.841 ACCGATGCTCACAGAGGAGAGTATTACTGGTCTGGACACC 880 

881 agatcatggcct'ctccagttggattcagcgggcccgagtt': 920 

921 tacctttcctctctatggaactatgggaaacgccgctcca 960 

961 CAACAACGTATCGTTGCTCAACTAGGTCAGGGTGTCTACA ,1000 

1001 GAACCTTGTCTTCCACCTTGTACAGAAGACCCTTCAATAT 1040 
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1041 CGGTATCAACAACCAGCAACTTTCCGTTCTTGACGGAACA . 1080 

1081 GAGTTCGCCTATGGAACCTCTTCTAACTTGCCATCCGCTG . 1120 

1121 TTTACAGAAAGAGCGGAACCGTTGATTCCTTGGACGAAAT 1160 

1161 CCCACCACAGAACAACAATGTGCC ACCCAGGCAAGGATTC 1200 

1201 TCCCACAGGTTGAGCCACGTGTCCATGTTCCGTTCCGGAT 1240 

1241- TCAGCAACAGTTCCGTGAGCATCATCAGAGCTCCTATGTT 1280 

• • • • • 

1281 CTCTTGGATACACCGTAGTGCTGAGTTCAACAACATCATC 1320 

1321. GCATCCGATAGTATTACTCAAATCCCTGCAGTGAAGGGAA 1360 

1361 ACTTTCTCTTCAACGGTTCTGTCATTTCAGGACCAGGA.TT .1400 

1401 C ACTGGTGGAGACCTCGTTAGACTC AACAGCAGTGGAAAT .1440 

. ..»••••, • • 

1441 AACATTCAGAATAGAGGGTATATTGAAGTTCCAATTCACT 1480. 

1481 TCCCAICCACATCTACCAGATATAGAGTTCGTGTGAGGTA 1520 

1521 TGCTTCTGTGACCCCTATTCACCTCAACGTTAATTGGGGT . 1560 

1561 AATTCATCCATCTTCTCCAATACAGTTCCAGCTACAGCTA 1 600 

1601 CCTCCTTGGATAATCTCCAATCCAGCGATTTCGGTTACTT 1640 
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1641 TGAAAGTGCCAATGCTTTTACATCTTCACTCGGTAACATC 1680 

1681 GTGGGTGTTAGAAACTTTAGTGGGACTGCAGGAGTGATTA 1720; 

1721 TCGACAGATTCGAGTTCATTCCAGTTACTGCAACACTCGA 1760 

1761 GGCTGAG 1767, 

/ 

. C. A structural gene encoding a Insecticidal protein of Rtk. HD-1 having the sequence: . 

• ■■• ■ ' * 

1 ATGGACAACAACCCAAACATCAACGAATGCATTCCATACA 40 

• .. • ■ . 

41 ACTGCTTGAGTAACCCAGAAGTTGAAGTACTTGGTGGAGA 80 

81 ACGCATTGAAACCGGTTACACTCCCATCGACATCTCCTTG 120 

121 TCCTTGACACAGTTTCTGCTCAGCGAGTTCGTGCCAGGTG 160 

• • • • • 

161 CTGGGTTCGTTCTCGGACTAGTTGACATCATCTGGGGTAT 200 

201 CTTTGGTCCATCTCAATGGGATGCATTCCTGGTGCAAATT 240 

241 GAGCAGTTGATCAACCAGAGGATCGAAGAGTTCGCCAGGA 280/ 

281 ACCAGGCCATCTCTAGGTTGGAAGGATTGAGCAATCTCTA 320 

321 CCAAATCTATGCAGAGAGCTTCAGAGAGTGGGAAGCCGAT 360 
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• • • • • 

'361 CCTACTAACCCAGCTCTCCGCGAGGAAATGCGTATTCAAT 4.00 

5 . , • - 

401 TCAACGACATGAACAGCGCCTTGACCACAGCTATCCCATT 440 
. . ■•' • 

10 441 GTTCGCAGTCCAGAACTACCAAGTTCCTCTCTTGTCCGTG 480 

481 TACGTTC AAGC AGCTAATCTTCACCTCAGCGTGCTTCGAG 520 

is ' " * * 

521 ACGTTAGCG7GTTTGGGCAAAGGTGGGGATTCGATGCTGC 560 

561 AACCA7CAA7AGCCG77ACAACGACC7TACTAGGCTGATT 600 

20 

601 GGAAACTACACCGACCACGCTGTTCGTTGGTACAACACTG 640 

25 641 GCTTGGAGCGTGTC7GGGG7CCTGATTCTAGAGATTGGAT 680 

681 T AG ATAC AACC AG TTC AGG AG AG AAT7GAC C CTC ACAGTT 720 

30 

721 TTGGACATTGTGTCTCTCXTCCCGAACTATGACTCCAGAA 760 

as 761 C C TACCC TATCCG 7 AC AGT GT CCC AACTTAC C AG AG AAAT 800 

801 CTATACTAACCCAGTTCTTGAGAACTTCGACGGTAGCTTC 840 

40 

841 CGTGGTTCTGCCCAAGGTATCGAAGGCTCCATCAGGAGCC 880 

881 CACACTTGATGGACATCTTGAACAGCATAACTATCTACAC 920 

45 

921 CGATGCTCACAGAGGAGAGTAT7ACTGGTCTGGACACCAG 960 
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' 9 61 ATCATGGCCTCTCCAGTTGGATTCAGCGGGCCCGAGTTTA . 1000 

1001 CCTTTCCTCTCTATGGAACTATGGGAAACGCCGCTCCACA . 1040 

1041 ACAAC GT ATC GTT G C T C AACTAGGT C AGGGT GTCT AC AGA 1080 

1081 ACCTTGTCTTCCACCTTGTACAGAAGACCCTTCAATATCG 1120 

1121 GTATCAACAACCAGCAACTTTCCGTTCTTGACGGAACAGA 1160 

1161. GTTCGCCTATGGAACCTCTTCTAACTTGCCATCCGCTGTT 1200 

1201 TACAGAAAGAGCGGAACCGTTGATTCCTTGGACGAAATCC 1240- 

• • • • 

1241 CACCACAGAACAACAATGTGCCACCCAGGCAAGGATTCTC 1280 

• V '. - ■ • 
1281 CCACAGGTTGAGCCACGTGTCCATGTTCCGTTCCGGATTC 1320 

•*•'•. ■ • ■ 

1321 AGCAACAGTTCCGTGAGCATCATCAGAGCTCCTATGTTCT 1360 

13 61 CATGGATTCATCGTAGTGCTGAGTTCAACAATATCATTCC 1400 

1401 TTCCTCTCAAATCACCCAAATCCCATTGACCAAGTCTACT 1440 

1441 AACCTTGGATCTGGAACTTCTGTCGTGAAAGGACCAGGCT 1480 

1481 TCACAGGAGGTGATATTCTTAGAAGAACTTCTCCTGGCCA 1S20 

. • • • 

1521 GATTAGCACCCTCAGAGTTAACATCACTGCACCACTTTCT 1560 



EP0 385 962B1 

1561 CAAAGATATCGTGTCAGGATTCGTTACGCATCTACCACTA\ 1600 

1601 ACTTGCAATTCC AC ACCTC CATC G AC GG AAGGCCTATCRA . 1640 

1641 TCAGGGTAACTTCTCCGCAACCATGTCAAGCGGCAGCAAC 1680 . 

1681 TTGCAATCCGGCAGCTTCAGAACCGTCGGTTTCACTACTC .. 1720 . 

1721 CTTTCAACTTCTCTAACGGATCAAGCGTTTTCACCCTTAG 1760 

1761 CGCTCATGTGTTCAATTCTGGCAATGAAGTGTACATTGAC 1800 

• • • • 

1801 CGTATTGAGTTTGTGCCTGCCGAAGTTACCTTCGAGGCTG 1840 

1841 AGTAC 1843. 

D. A structural gene encoding an Insecticidal protein derived from Rf.fc HD-73 having the sequence: 

1 ATGGACAACAACCCAAACATCAACGAATGCATTCCATACA .40 

41 ACTGCTTGAGTAACCCAGAAGTTGAAGTACTTGGTGGAGA . . 80 

81 ACGCATTGAAACCGGTTACACTCCCATCGACATCTCCTTG 120 

121 TCCTTGACACAGTTTCTGCTCAGCGAGTTCGTGCCAGGTG 160 

161 CTGGGTTCGTTCTCGGACTAGTTGACATCATCTGGGGTAT . 200 
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201 CTTTGGTCCATCTCAATGGGATGCATTCCTGGTGCAAATT 240 

• ■ • • • • 

241 G AGCAGTTG AT C AACCAG AGG ATCG AAGAGTTCG CC AGGA 280 

281 AC C AGGC C ATCTC T AGGTTGGAAGG ATTGAGC AATCTCTA 320 

321 CCAAATCTATGCAGAGAGCTTCAGAGAGTGGGAAGCCGAT 360 

3 61 CCTACTAACCCAGCTCTCCGCGAGGAAATGCGTATTCAAT 400 

401 T C AACG AC AT G AAC AGCGC CTTG AC C AC AGCT ATCCC ATT . 440 

• • • • 

4 41 GTTCGCAGTCCAGAACTACCAAGTTCCTCTCTTGTCCGTG 480 

• .*" ' • * 

481 TACGTTCAAGCAGCTAATCTTCACCTCAGCGTGCTTCGAG . 520 

521 ACGTTAGCGTGTTTGGGCAAAGGTGGGGATTCGATGCTGC 560 

5 61 AACCATCAATAGCCGTTACAACGACCTTACTAGGCTGATT . 600 

601 GGAAACTACACCGACCACGCTGTTCGTTGGTACAACACTG 640 

641 GCTTGGAGCGTGTCTGGGGTCCTGATTCTAGAGATTGGAT 680 

681 T AG ATAC AACC AGTTC AGG AG AG AATTG ACCCTC AC AGTT 720 

721 TTGG AC ATTGTGT CT CT CTTCCCGAACTATG ACT CC AGAA 760 

» • • • 

7 61 C CTAC CC TATCCG T AC AGT GT C C CAACTTACC AGAGAAAT 800 
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'801 CTATACTAACCCAGTTCTTGAGAACTTCGACGGTAGCTTC 840 

• , • .-• ■ 

841 CGTGGTTCTGCCCAAGGTATCGAAGGCTCCATCAGGAGCC 880 

• • . ••' • * . 

881 C AC ACTT GAT G G AC ATCTT G AAC AG C ATAACT ATCT ACAC 920 

921 CGATGCTCACAGAGGAGAGTATTACTGGTCTGGACACCAG 960 

961 ATCATGGCCTCTCCAGTTGGATTCAGCGGGCCCGAGTTTA 1000 

1001 CCTTTCCTCTCTATGGAACTATGGGAAACGCCGCTCCACA .1040 

1041 ACAACGTATCGTTGCTCAACTAGGTCAGGGTGTCTACAGA 1080 

1081 ACCTTGTCTTCCACCTTGTACAGAAGACCCTTCAATATCG . 1120 

1121 GTATCAACAACCAGCAACTTTCCGTTCTTGACGGAACAGA 1160 . 

1161 GTTCGCCTATGGAACCTCTTCTAACTTGCCATCCGCTGTT 1200 

• • • " 

1201 TACAGAAAGAGCGGAACCGTTGATTCCTTGGACGAAATCC . 1240 

1241 CACCACAGAACAACAATGTGCCACCCAGGCAAGGATTCTC 1280 

• • • • 

1281 CCACAGGTTGAGCCACGTGTCCATGTTCCGTTCCGGATTC 1320 
, • • « 

1321 AGCAACAGTTCCGTGAGCATCATCAGAGCTCCTATGTTCT 1360 
. • • • 

1361 CTTGGATACACCGXAGTGCTGAGTTCAACAACATCATCGC 1400 
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' 1401 ATCCGATAGTATTACTCAAATCCCTGCAGTGAAGGGAAAC 1440 

1441 TTTCTCTTCAACGGTTCTGTCATTTCAGGACCAGGATTCA 1480 

• • • • 

1481 CTGGTGGAGACCTCGTTAGACTCAACAGCAGTGGAAATAA 1520 

. • • • 

1521 C ATTC AG AAT AG AGG G TAT ATTG AAG TTCCAATTCACTTC 1560 

15 61 CCATCCACATCTACCAGATATAGAGTTCGTGTGAGGTATG 1600 

1601 CTTCTGTGACCCCTATTCACCTCAACGTTAATTGGGGTAA 1640 

1641 TTCATCCATCTTCTCCAATACAGTTCCAGCTACAGCTACC 1680 

• • ■ • 

1681 TCCTTGGATAATCTCCAATCCAGCGATTTCGGTTACTTTG 1720 

1721 AAAGTGCCAATGCTTTTAC ATCTTC ACTCGGTAAC ATCGT 1760 

• 

1761 GGGTGTTAGAAACTTTAGTGGGACTGCAGGAGTGATTATC 1800 

. .. • 

1801 GACAGATTCGAGTTCATTCCAGTTACTGCAACACTCGAGG 1840 

• » ■ • 

1841 CTGAATATAATCTGGAAAGAGCGCAGAAGGCGGTAATGCG 1880 

1881 CTGTTTACGTCTACAAACCAGCTTGGACTCAAGACAAATG 1920 . 
1921 G 1921J 

E. A structural gene encoding the full-length Insectlcldal protein of BJ.k. HD-73 having the sequence: 
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t • • • 

1 ATGGACAACAACCCAAACATCAACGAATGCATTCCATACA 40 

41 ACTGCTTGAGTAACCCAGAAGTTGAAGTACTTGGTGGAGA 80 

8X ACGCATTGAAACCGGTTACACTCCCA7CGACATCTCCTTG 120 

121 TCCTTGACACAGTTTCTGCTCAGCGAGTTCGTGCCAGGTG ,• 160- 

161 CTGGGTTCGTTCTCGGACTAGTTGACATCATCTGGGGTAT 200 

201 CTTTGGTCCATCTCAATGGGA2GCATTCCTGGTGCAAATT. .. 240 

241 GAGCAGTTGATCAACCAGAGGATCGAAGAGTTCGCCAGGA 280 

281 ACCAGGCCATCTCTAGGTTGGAAGGATTGAGCAATCTCTA 320 

321 CCAAATCTATGCAGAGAGCTTCAGAGAGTGGGAAGCCGAT 360 

361 CCTACTAACCCAGCTCTCCGCGAGGAAATGCGTATTCAAT 400 

401 TCAACGACATGAACAGCGCCT7GACCACAGCTATCCCATT 440 

441 GTTCGCAGTCCAGAACTACCAAGTTCCTCTCTTGTCCGTG 480 

481 TACGTTCAAGCAGCTAATCTTCACCTCAGCGTGCTTCGAG 520 

521 ACGTTAGCGTGTTTGGGCAAAGGTGGGGATTCGATGCTGC 560 

• • - • 

. 5 61 AACCATCAATAGCCGTTACAACGACCTTACTAGGCTGATT 600 
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• • • . • 

' 601 GGAAACTACACCGACCACGCTGTTCGTTGGTACAACACTG . . 640 

641 GCTTGGAGCGTGTCTGGGGTCCTGATTCTAGAGATTGGAT 680 

681 TAGATACAACCAGTTCAGGAGAGAATTGACCCTCACAGTT 720 

721 TTGGACATTGTGTCTCTCTTCCCGAACTATGACTCCAGAA 760 

761 CCT ACC CTATC CG TACAGTGTCC CAACTT ACCAGAGAAAT - . 800 . 

801 CTATACTAACCCAGTTCTTGAGAACTTCGACGGTAGCTTC . 840 

• ■ « • • ■-. 

841 CGTGGTTCTGCCCAAGGTATCGAAGGCTCCATC AGGAGCC 880 

• • • • • 

881 C AC ACTTG ATG G AC ATCTTG AAC AG CATAACT ATC TACAC 920 

921 CGATGCTCACAGAGGAGAGTATTACTGGTCTGGACACCAG 960,, 

961 ATCATGGCCTCTCCAGTTGGATTCAGCGGGCCCGAGTTTA 1000 

1001 CCTTTCCTCTCTATGGAACTATGGGAAACGCCGCTCCACA. 1040. 

1041 ACAACGTATCGT TGCTCAACTAGGTCAGGGTGTCTACAGA 1080 

'1081 ACCTTGTCTTCCACCTTGTACAGAAGACCCTTCAATATCG 1120 

1121 GTATCAACAACCAGCAACTTTCCGTTCTTGACGGAACAGA 1160 

( • • • 

1161 GTTCGCCTATGGAACCTCTTCTAACTTGCCATCCGCTGTT 1200 
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' 12 0 1 TACAGAAAGAGCGGAACCG7XGA7TCCTTGGACGAAATCC 1240 

1241 CACCACAGAACAACAATGTGC'CACCCAGGCAAGGATXCTC- 1280 

■ • • . • 

1281 CCACAGGTTGAGCCACGTGTCCATGTTCCGTTCCGGATTC . 1320 

1321 AGCAACAGTTCCGTGAGCATCATCAGAGCTCCTATGTTCT, 1360 

1361 CTTGGATACACCdTAGTGCTGAGTTCAACAACATCATCGC 1400 

1401 ATCCGATAG7A77AC7CAAATCCC7GCAG7GAAGGGAAAC . 1440 

1441 TTTCTCTTCAACGGTTbTGTCATTTCAGGACCAGGATTCA 1480 

1481 CTGG7GGAGACCTCG7TAGAC7C AACAGCAG7GGAAATAA 1 52 0 

1521 CATTCAGAATAGAGGGTATATTGAAGTTCCAATTCACTTC 1560 

1561 CC AT C C AC A T CT AC CAG ATATAG AGTTCG TGT G AGGTATG 1600 

1601 CTTCTGTGACCCCTATTCACCTC AACGTTAATTGGGGTAA 1 64 0 

1641 TTCATCCATCTTCTCCAATACAGTTCCAGCTACAGCTACC 1680 

1681 TCCTTGGATAATCTCCAATCCAGCGATTTCGGTTACTTTG 1720 

1721 AAAGTGCCAATGCTTXTACATCTTCACTCGGTAACATCGT 1760 

1761 GGGTGTTAGAAACTTTAG7GGGAC7GCAGGAGTGATTATC" 1800 



56 



EP 0 385 962 B1 



' 1801 GACAGATTCGAGTTCATTCCAGTTACTGCAACACTCGAGG 1840 

1841 CTGAATATAATCTGGAAAGAGCGCAGAAGGCGGTGAATGC 1880 

1881 GCTGTTTACGTCTACAAACCAGCTCGGCCTCAAGACCAAT 1920 

1921 GTGACGGATTATCATATTGATCAAGTGTCCAACTTGGTGA 1960 

1961 CCTACCTCAGCGATGAGTTCTGTCTGGATGAAAAGCGAGA 2000 

2001 ATTGTCCGAGAAAGTCAAACATGCGAAGCGACTCAGTGAT 2040 

2041 G AAC G C AATTTACTCC AAG ATTC AAATTTC AAAGACATTA 2080 

2081 . ATAGGCAACCAGAACGTGGGTGGGGCGGAAGTACAGGGAT. 2120 

2121 TACCATCCAGGGAGGTGACGACGTGTTCAAGGAGAACTAC 2160. 

2161 GTCACACTATCAGGTACCTTTGATGAGTGCTATCCAACAT 2200 

2201 ACCTCTACCAGAAGATCGACGAGTCCAAGTTGAAAGCCTT . 2240 

2241 TACCCGTTATCAATTAAGAGGGTATATCGAAGATAGTCAA 2280 

• • • . ■ 

2281 GACCTCGAGATCTACCTCATCCGCTACAATGCAAAACATG 2320 

2321 AAAC AG T AAAT GTGCC AGGTAC G GGTT CCTT ATGGCCGCT 2360 

2361 TTCAGCCCAAAGTCCAATCGGAAAGTGTGGAGAGCCGAAT 2400 
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• • • » 

' 2401 CGATGCGCGCCACACCTTGAATGGAATCCTGACTTAGATT 2440 

2441 GTTCGTGTAGGGATGGAGAAAAGTGTGCCCATCATTCGCA 2480 

2481 TC ATTTCTCCTT AG AC ATT GAT GTAGG AT GTAC AG ACTTA 2520 

2521 AATGAGGACCTAGGTGTATGGGTGATCTTTAAGATTAAGA 2560 

2561 CGCAAG ATGGGC ACGCAAGACTAGGG AATCTAGAGTTTCT 2600 

2601 CGAAGAGAAACCATTAGTAGGAGAAGCGCTAGCTCGTGTG . 2640 

2641 AAAAGAGCGGAGAAAAAATGGAGAGACAAACGTGAGAAGT 2680 

• • • ■ • • 

2681 TGGAATGGGAGACCAACATCGTCTACAAAGAGGCAAAAGA 2720 

2721 ATCTGTAGATGCTTTATTTGTAAACTCTCAATATGATCAA 2760 

2761 TT ACAAGCGG AT ACG AAT ATTGCC ATGATTCATGCGGCAG 2800 

2801 ATAAACGTGTTCATAGCATTCGAGAAGCTTATCTGCCTGA 2840 

2841 GCTGTCTGTGATTCCGGGTGTCAATGCGGCTATTTTTGAA . 2880 

2881 G AATTAGAAGGG CGT AT TTTC ACTG C ATTCTCC CT CT ACG 2920 

• • •. • 

2921 ATGCCAGAAACGTCATCAAGAACGGTGACTTCAACAATGG 2960 . 

• . • • • - 
2961 CTTATCCTGCTGGAACGTGAAAGGGCATGTAGATGTAGAA 3000 
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* 3001 GAACAAAACAACCAACGTTCGGTCCTTGTTGTTCCGGAAT .3040 

3041 GGGAAGCAGAAGTGTCACAAGAAGTTCGTGTCTGTCCGGG 3080 

• • • 

3081 TCGTGGCTATATCCTTCGTGTCACAGCGTACAAGGAGGGA 3120 

3121 TATGGAGAAGGTTGCGTAACCATTCATGAGATCGAGAACA 3160 

3161 ATACAGACGAACTGAAG7TTAGCAACTGCGTAGAAGAGGA . 3200 

3201 AATCTATCCAAATAACACGGTAACGTGTAATGATTATACT. 3240 

3241 GTAAATCAAGAAGAATACGGAGGTGCGTACACTTCTCGTA 3280 

3281 ATCGAGGATATAACGAAGCTCCTTCCGTACCAGCTGATTA : 3320 

3321 TGCGTCAGTCTATGAAGAAAAATCGTATACAGATGGACGA 3360 

3361 AGAGAGAATCCTTGTGAATTTAACAGAGGGTATAGGGATT 3400 

3401 AC AC GC CACT AC C AGTTGGTT AT GT G AC AAAAG AATTAGA 3440. 

3441 ATACTTCCC AG AAACC G ATAAG G TATGG ATTG AG ATTGGA 3480 

3481 G AAAC GG AAGG AAC ATTTATCGT GG ACAG CGTGG AATTAC 3520 



3521 TCCTTATGGAGGAA 3534. 

F. A structural gene encoding a full-length Insectlcldal protein of B.t.k. HD-73 having the sequence: 
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■ 1 ATGG AC AAC AAC CCAAAC ATC AACG AATGCATTC C ATACA 40- 

.41 ACTGCTTGAGTAACCCAGAAGTTGAAGTACTTGGTGGAGA. 80 

10 81 ACGCATTGAAACCGGTTACACTCCCATCGACATCTCCTTG 120 

121 TCCTTGACACAGTTTCTGCTCAGCGAGTTCGTGCCAGGTG 160 

15 . 161 CTGGGTTCGTTCTCGGACTAGTTGACATCATCTGGGGTAT 200 

201 CTTTGGTCCATCTCAATGGGATGCATTCCTGGTGCAAATT 240 

20 

• • • 

241 GAGCAGTTGATCAACCAC-AGGATCGAAGAGTTCGCCAGGA 280 

« 281 ACCAGGCCATCTCTAGGTTGGAAGGATTGAGCAATCTCTA 320 

321 CC AAATCTAT G C AG AG AGCT TC AG AG AG TGGG AAGCCG AT . 360 

30 • ' • • 

361 CCTACTAACCCAGCTCTCCGCGAGGAAATGCGTAT7CAAT 400 

401 TCAACGACATGAACAGCGCCTTGACCACAGCTATCCCATT 440 

441 GTTCGC AGTCCAGAACTACC AAGTTCCTCTCTTGTCCGTG 480 

481 TACGTTCAAGCAGCTAATCTTCACCTCAGCGTGCTTCGAG 520 

45 

SO 
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521 ACGTTAGCGTGTTTGGGCAAAGGTGGGGATTCGATGCTGC. 560 

5 61 AACCATCAATAGCCGTTACAACGACCTTACTAGGCTGATT 600 

601 GGAAACTACACCGACCACGCTGTTCGTTGGTACAACACTG 640 

64 1 GCTTGGAGCGTGTCTGGGGTCCTGATTC7AGAGATTGGAT 68.0 

681 TAGATACAACCAGTTCAGGAGAG AAXTGACCCTCACAGTT . . 720 

721 TTGG AC ATTGT GTC TC TCTT CCC G AACTATG ACTC C AGAA 760 

761 CCTACCCTATCCGTACAGTGTCCCAACTTACCAGAGAAAT 8 00 

801 CTATACTAACCCAGTTCTTGAGAACTTCGACGGTAGCTTC 840 

.841 CGTGGTTCTGCCCAAGGTATCGAAGGCTCCATCAGGAGCC 880 
. • . • . 

881 CACACTTGATGGACATCTTGAACAGCATAACTATCTACAC 920 

921 CGATGCTCACAGAGGAGAGTATTACTGGTCTGGACACCAG 960 

961 ATCATGGCCTCTCCAGTTGGATTCAGCGGGCCCGAGTTTA 1000 

1001 CCTTTCCTCTCTATGG AACTATG GGAAACGCCGCTCCACA 1040 

1041 ACAACGTATCGTTGCTCAACTAGGTCAGGGTGTCTACAGA 1080 

• • • . • ■ 

1081 ACCTTG TCTT CC ACCTT GTAC AG AAG ACC CTTC AATATCG 1120 
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' 1121 GTATCAACAACCAGCAACTTTCCGTTCTTGACGGAACAGA 1160 
1161 GTTCGCCTATGGAACCTCTTCTAACTTGCCATCCGCTGTT ' 1200 

1201 TACAGAAAGAGCGGAACCGTTGATTCCTTGGACGAAATCC 1240 

• • • • 

1241 CACCACAGAACAACAATGTGCCACCCAGGCAAGGATTCTC 1280 

1281 CCACAGGTTGAGCCACGTGTCCATGTTCCGTTCCGGATTC 1320 

1321 AGCAACAGTTCCGTGAGCATCATCAGAGCTCCTATGTTCT. 1360 

1361 CTTG G AT AC ACC GTAGTGCTG AG TTC AAC AAC ATC ATCGC 1400 

1401 ATCCGATAGTATTACTCAAATCCCTGCAGTGAAGGGAAAC 1440 

1441 TTTCTCTTCAACGGTTCTGTCATTTCAGGACCAGGATTCA 1480 

• • « • 

1481 CTGGTGGAGACCTCGTTAGACTCAACAGCAGTGGAAATAA 1320 

1521 CATTCAGAATAGAGGGTATATTGAAGTTCCAATTCACTTC 1560 

1561 CCATCC ACATCTACCAGATATAGAGTTCGTGTGAGGTATG 1600 

1601 CTTCTGTGACCCCTATTCACCTCAACGTTAATTGGGGTAA m 1640 

1641 TTCATCCATCTTCTCCAATACAGTTCCAGCTACAGCTACC 1680 

. • • • 

1 681 T C CTTG G AT AAT CT CC AATCC AG CG ATT7 CGGTT ACTTTG 1720 
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• • «'•..• . • 

1721 AAAGTGCCAATGCTTTTACATCTTCACTCGGTAACATCGT 1760 

• •. • 

1761 GGGTGTTAGAAACTTTAGTGGGACTGCAGGAGTGATTATC 1800 

• : • • • • 

1801 GACAGATTCGAGTTCATTCCAGTTACTGCAACACTCGAGG 1840 

• *.-;:.■• • 

1841 CTGAATATAATCTGGAAAGAGCGCAGAAGGCGGTGAATGC 1880 

1881 GCTGTTTACGTCTACAAACCAACTAGGGCTAAAAACAAAT 1920 

1921 GTAAC G G ATT AT C ATAT TG ATC AAGTGTC C AATTT AGTTA 1960 . 

1961 CGTATTT ATC GG ATGAATTTTGTCTGGATGAAAAGCGAGA 2000 

2001 ATTGTCCGAGAAAGTC AAACATGCGAAGCGACTCAGTGAT 2040 

2041 GAACGC AATTT ACTCCAAGATTCAAATTTCAAAG AC ATTA 2080 

2081 ATAGGC AACCAG AACGTGGGTGGGGCGGAAGTACAGGGAT 2120 

• ■ ■ . • • • 

2121 T AC CAT C C AAGG AGGGG ATG ACG TATTTAAAG AAAATTAC 2160 

2161 GTCACACTATCAGGTACCTTTGATGAGTGCTATCCAACAT 2200 
2201 ATTTGTATCAAAAAATCGATGAATCAAAATTAAAAGCCTT 2240 

. i • • 

2241 T ACCC G TTATC AATT AAG AGG GT AT ATCG AAG ATAGTCAA 2280 

. • • • 

2281 GACTTAGAAATCTATTTAATTCGCTACAATGCAAAACATG 2320 
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* 2321 AAACAGTAAATGTGCCAGGTACGGGTTCCTTATGGCCGCT 2360 

• • ' . ■ *. .••»..' 

23 61 TTCAGCCCAAAGTCCAATCGGAAAGTGTGGAGAGCCGAAT 2400 

2401 CGATGCGCGCCACACCTTGAATGGAA7CCTGACTTAGATT 2440 

• •* ; • 

2441 GTTCGTGT AGGGATGGAG AAAAGTGTGCCCATCATTCGC A . 24 80 / 

2481 TCATTTCTCCTTAGACATTGATGTAGGATGTACAGACTTA ; . 2520, 

2521 AATGAGGACCTAGGTGTATGGGTGATCTTTAAGATTAAGA 2560 

2561 CGCAAGATGGGCACGCAAGACTAGGGAATCTAGAGTTTCT 2600 

2601 CGAAGAGAAACCATTAGTAGGAGAAGCGCTAGCTCGTGTG 2640 

2641 AAAAGAGCGG AGAAAAAATGGAGAGACAAACGTGAAAAAi 2680 

2 681 TGGAATGGGAAACAAATATCGTTTATAAAGAGGCAAAAGA 2720 

• • * * 

2721 ATCTGTAGATGCTTTATTTGTAAACTCTCAATATGATCAA 2760 

2761 TTACAAGCGGATACGAATATTGCCATGATTCATGCGGCAG . 28.00 

• ' • * ' • * ,\ 
28"01 ATAAAC GTGTT C AT AGC ATTCG AGAAGCT TATCTGCCTGA 2840 

.-' ' ' 

2841 GCTGTCTGTGATTCCGGGTGTCAATGCGGCTATTTTTGAA 2880. 

.. i •' • 

12881 GAATTAGAAGGGCGTATTTTCACTGCATTCTCCCTATATG. 2920 
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'2921 ATGCGAGAAATGTCATTAAAAATGGTGATTTTAATAATGG 2960 

• • • . • ■ • 

2961 CTTATCCTGCTGGAACGTGAAAGGGCATGTAGATGTAGAA 3000 

3001 GAACAAAACAACCAACGTTCGGTCCTTGTTGTTCCGGAAT 3040 

3081 TCGTGGCTATATCCTTCGTGTCACAGCGTACAAGGAGGGA 3120 

3121 TATGGAGAAGGTTGCGTAACCATTCATGAGATCGAGAACA 3160 

3161 ATACAGACGAACTGAAGTTTAGCAACTGCGTAGAAGAGGA 3200 

3201 AATCTATCCAAATAACACGGTAACGTGTAATGATTATACT. 3240 

3241 GTAAATCAAG AAG AATACGG AGGTGCGTACACTTCTCGTA 3280 

3281 ATCGAGGATATAACGAAGCTCCTTCCGTACCAGCTGATTA 3320 

3321 TGCGTCAGTCTATGAAGAAAAATCGTATACAGATGGACGA 3360 

3361 AGAGAGAATCCTTGTGAATTTAACAGAGGGTATAGGGATT 3400 

3401 AC ACG CC ACTAC C AGTTGGTT ATGTG ACAAAAG AATT AGA 3440 

3441 ATACTTCCCAGAAACCGATAAGGTATGGATTGAGATTGGA .3480 

« • • • 

3481 GAAACGGAAGGAACATTTATCGTGGACAGCGTGGAATTAC 3520 

3521 TCCTTATGGAGGAA 3534. 

G. A structural gene encoding a full-length insectlcldal protein of B.tk. HD-73 having the sequence: 
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1 ATGG ACAAC AACCC AAACATC AACG AATGCATTCC AT ACA 4 0 

41 ACTGCTTGAGTAACCCAGAAGTTGAAGTACTTGGTGGAGA 80 

81 ACGCATTGAAACCGGTTACACTCCCATCGACATCTCCTTG . 120- 

121 TCCTTGACACAGTTTCTGCTCAGCGAGTTCGTGCCAGGTG. ., 160 

161 CTGGGTTCGTTCTCGGACTAGTTGACATCATCTGGGGTAT 200 

201 CTTTGGTCCATCTCAATGGGATGCATTCCTGGTGCAAATT 240. 

241 GAGCAGTTGATCAACCAGAGGATCGAAGAGTTCGCCAGGA 280 

281 ACCAGGCCATCTCTAGGTTGGAAGGATTGAGCAATCTCTA 320 

321 CCAAATCTATGCAGAGAGCTTCAGAGAGTGGGAAGCCGAT . 360 

361 C CT ACT AACC C AGCT CTC C G C G AG G AAATGC GT ATTC AAT 400 
« • • ♦ »••■■■ 

.401 TCAACGACATGAACAGCGCCTTGACCACAGCTATCCCATT 440 
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• • • • 

441 GTTCGCAGTCCAGAACTACCAAGTTCCTCTCTTGTCCGTG 480 

481 TACGTTCAAGCAGCTAATCTTCACCTCAGCGTGCTTCGAG S20 

• ■ • • 

521 ACGTTAGCGTGTTTGGGCAAAGGTGGGGATTCGATGCTGC 560 

• . • • 

5 61 AAC CAT C AAT AG C CGTTAC AACG AC CTT ACT AGGCTG ATT 600 

601 GGAAACTACACCGACCACGCTGTTCGTTGGTACAACACTG 640 

641 GCTTGGAGCGTGTCTGGGGTCCTGATTCTAGAGATTGGAT 680 

■ • • • 

681 TAGATACAACCAGTTCAGGAGAGAATTGACCCTCACAGTT 720 

• • * • 

721 TTGGACATTGTGTCTCTCTTCCCGAACTATGACTCCAGAA 760 

761 CCTACCCTATCCGTACAGTGTCCC AACTTACCAGAGAAAT 800 

• « • • 

801 CTATACTAACCCAGTTCTTGAGAACTTCGACGGTAGCTTC 840 

• • • • . • • 

841 CGTGGTTCTGCCCAAGGTATCGAAGGCTCCATCAGGAGCC 880 

881 CACACTTGATGGACATCTTGAACAGCATAACTATCTACAC 920 

921 C GATGCTCAC AG AGG AG AGTATT ACTGGTCT GG ACACCAG 9 60 

961 ATCATGGCCTCTCCAGTTGGATTCAGCGGGCCCGAGTTTA 1000 

1001 CCTTTCCTCTCTATGG AACTATGGGAAACGCCGCTCCACA 1040 
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1041 ACAACGTATCGTTGCTCAACTAGGTCAGGGTGTCTACAGA 1080 

1081 ACCTTGTCTTC C AC CTTGT AC AG AAG ACCCT TC AAT ATCG 1120 

1121 GTATCAACAACCAGCAACT7TCCGTTCTTGACGGAACAGA 1160 

1161 GTTCGCCTATGGAACCTCTTCTAACTTGCCATCCGCTGTT 1200 

1201 TACAGAAAGAGCGGAACCGTTGATTCCTTGGACGAAATCC 1240 . 

1241 CACCACAGAACAACAATGTGCCACCCAGGCAAGGATTCTC 1280 

1281 CCACAGGTTGAGCCACGTG7CCATGTTCCGTTCCGGATTC 1320 

1321 AGCAACAGTTCCGTGAGCATCATCAGAGCTCCTATGTTCT 1360 » 

13 61 CT7GGATACACCGTAG7GCTGAGTTCAACAACATCATCGC 1400 
1401 ATCCGATAGTATTACiCAAATCCCTGCAGTGAAGGGAAAC 1440 
1441 TTTCTCTTCAACGGTTCTGTCATTTCAGGACCAGGATTCA 1480 

14 81 CTGGTGGAGACCTCGTTAGACTCAACAGCAGTGGAAATAA 1520 

* 

1521. CATTCAGAATAGAGGGTATATTGAAGTTCCAATTCACTTC 1560 

1561 CCATCCACATCTACCAGATATAGAGTTCGTGTGAGGTATG 1600 

1601 CTTCTGTGACCCCTATTCACCTCAACGTTAATTGGGGTAA 1640 
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1 64 1 T TC ATC C ATCT TCTC C AAT AC AGTTC C AGCT AC AGCTACC 1680 

1681 TCCTTGGATAATCTCCAATCCAGCGATTTCGGTTACTTTG 1720 

1721 AAAGTGC CAATGCTTTTAC ATCTTCACTCGGTAACATCGT . 1760 

17 61 GGGTGTTAGAAACTTTAGTGGGACTGCAGGAGTGATTATC 1800 

1801 GACAGATTCGAGTTCATTCCAGTTACTGCAACACTCGAGG ;.. 1840, 

1841 CTGAGTACAACCTTGAGAGAGCCCAGAAGGCTGTGAACGC 1880 

1881 CCTCTTTACCTCCACCAATCAGCTTGGCTTGAAAACTAAC 1920 

1921 GTTACTGACTATCACATTGACCAAGTGTCCAACTTGGTCA I960 . 

1961 CCTACCTTAGCGATGAGTTCTGCCTCGACGAGAAGCGTGA 2000.. 

2001 ACTCTCCGAGAAAGTTAAACACGCCAAGCGTCTCAGCGAC 2040 

2041 GAGAGGAATCTCTTGCAAGACTCCAACTTCAAAGACATCA 2080 

2081 ACAGGCAGCCAGAACGTGGTTGGGGTGGAAGCACCGGGAT 2120 

2121 CACCATCCAAGGAGGCGACGATGTGTTCAAGGAGAACTAC 2160 

2161 GTCACCCTCTCCGGAACTTTCGACGAGTGCTACCCTACCT 2200 

2201 ACTTGTACCAGAAGATCGATGAGTCCAAACTCAAAGCCTT 2240 , 
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2241 CACCAGGTATCAACTTAGAGGCTACATCGAAGACAGCCAA 

• • • t • • • 
2281 G ACC T TG AAATCT ACTCGATC AGG T AC AATGCCAAGCACG 

• • • • • 
2321 AGACCGTGAATGTCCCAGGTACTGGTTCCCTCTGGCCACT 

2361 TTCTGCCCAATCTCCCATTGGGAAGTGTGGAGAGCCTAAC 

* • • '■ • * * 

2401 . AGATGCGCTCCACACCTTGAGTGGAATCCTGACTTGGACT 

2441 GCTCCTGCAGGGATGGCGAGAAGTGTGCCCACCATTCTCA 

24 81 TCACTTCTCCTTGGACATCGATGTGGGATGTACTGACCTG 

2521 AATGAGGACCTCGGAGTCTGGGTCATCTTCAAGATCAAGA 

2561 CCCAAGACGGACACGCAAGACTTGGCAACCTTGAGTTTCT 

2601 CGAAGAGAAACCATTGGTCGGTGAAGCTCTCGCTCGTGTG 

2641 AAG AGAGCAG AGAAGAAGTGG AGGG ACAAACGTG AGAAAC 

2681 TCGAATGGGAAACTAAC ATCGTTTACAAGGAGGCCAAAGA 
. ♦ ■ • 

2721 GTCCGTGGATGCTTTGTTCGTGAACTCCCAATATGATCAG 



2761 



TTGCAAGCCGACACCAACATCGCCATGATCCACGCCGCAG 



2801 ACAAACGTGTGCACAGCATTCGTGAGGCTTACTTGCCTGA 



2280 
2320 
2360 
2400 
2440 
2480 
2520 
2560 
2600 
2640 
2680 
2720 
2760 
2800 
2840 
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2841 GTTGTCCGTGATCCCTGGTGTGAACGCTGCCATCITCGAG 2880 

"•' • 

2881 G AACTTGAGGG AC GTATC TTTACC G C ATTCTCCTTGTACG 2920 

■- • • 

2921 ATGCCAGAAACGTCATCAAGAACGGTGACTTCAACAATGG 2960 

2961 CCTCAGCTGCTGGAATGTGAAAGGTCATGTGGACGTGGAG 3000 

3001 GAACAGAACAATCAGCG7TCCGTCCTGGTTGTGCCTGAGT ' 3040 

3041 GGGAAGCTGAAGTGTCCCAAGAGGTTAGAGTCTGTCCAGG 3080 

3081 TAGAGGCTACATTCTCCGTGTGACCGCTTACAAGGAGGGA 3120 

• « • 

3121 TACGGTGAGGGTTGCGTGACCATCCACGAGATCGAGAACA 3160 

• • .* • • 

3161 ACACCG ACGAGCTTAAGTTCTCCAACTGCGTCGAGGAAGA 3200 

• • • ■• 

3201 AATC7ATCCCAACAACACCGTTACTTGCAACGACTACACT 3240 

3241 GTGAATCAGGAAGAGTACGGAGGTGCCTACACTAGCCGTA 3280 . 

3281 ACAGAGGTTACAACGAAGCTCCTTCCGTTCCTGCTGACTA , 3320 

■ ■ • • 

3321 TGCCTCCGTGTACGAGGAGAAATCCTACACAGATGGCAGA 33 60 

33 61 C GTG AGAAC GCTT GCG AGT T C AAC AG AGGTT AC AG GG ACT . 3400 

3401 ACACACCACTTCCAGTTGGCTATGTTACCAAGGAGCTTGA 3440 
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« « . • « 

3441 GTACTTTCCTGAGACCGACAAAGTGTGGATCGAGATCGGT 3480 

3481 GAAACCGAGGGAACCTTCATCGTGGACAGCGTGGAGCTTC 3520 

3521 TCTTGATGGAGGAA 3534. 

/ 

H. A structural gene which encodes an insecticldal protein of B.tt having the sequence: 

1 ATGACTGCAGACAACAACACCGAAGC.CCTCGACAGTTCTA 40 
. . • • 

41 CCACTAAGGATGTTATCCAGAAGGGTATCTCCGTTGTGGG 80 

. . • • 

81 AGACCTCTTGGGCGTGGTTGGATTTCCCTTCGGTGGAGCC 120 

121. CTCGTGAGCTTCTATACAAACTTTCTCAACACCATTTGGC 1 60 
. • • « 

161 CAAGCGAGGACCCTTGGAAAGCATTCATGGAGCAAGXTGA 200. 

. • • • 

201 AGCTCTTA7GGATCAGAAGATTGCAGATTATGCCAAGAAC 240 

• 4 « • . 

241 AAGGC1TTGGCAGAACTCCAGGGCCTTCAGAACAATGTGG 280. 

281 AGGACTACGTGAGTGCATTGTCCAGCTGGCAGAAGAACCC 320 

• > ■ •• 

321 TGTTAGCTCCAGAAATCCTC ACAGC CAAGGTAGG ATC AGA 360 

361 GAGTTGTTCTCTCAAGCCGAATCCCACTTCAGAAATTCCA 400 
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.. . ' .401 TGCCTAGCTTTGCTATCTCCGGTTACGAGGTTCTTTTCCT 440 

.••• s . ' • '* 

441 CACTACCTATGCTCAAGCTGCCAACACCCACTTGTTTCTC .480 

• .•' • * 

.481 CTTAAGGACGCTCAAATCTATGGAGAAGAGTGGGGATACG 520 

io • . 

. • • • 

. 521 AGAAAGAGGACATTGCTGAGTTCTACAAGCGTCAACTTAA 560 

16 561 GCTCACCCAAGAGTACACTGACCATTGCGTGAAATGGTAT 600 

601 AACGTTGGTCTCGATAAGCTCAGAGGCTCTTCCTACGAGT 640 

20 •■ 

• • • * 

641 C TT G GGT G AACTT C AAC AG AT AC AGGAG AG AG ATGAC CTT 680 
25 ■ 681 GACTGTGCTCGATCTTATCGCACTCTTTCCCTTGTACGAT 720 

721 GTGAGACTCTACCCAAAGGAAGTGAAAACTGAGCTTACCA 760 

so • • 

761 GAGACGTGCTCACTGACCCTATTGTCGGAGTCAACAACCT 800 

.' ■ • • • ? * ' 

801 TAGGGGTTATGGAACTACCTTCAGCAATAXCGAAAACTAC 840 

841 ATTAGGAAACCACATCTCTTCGACTATCTTCACAGAATTC 880 

40 881 AATT CC AC AC AAGGTTTC AAC CAGG ATACT ATGGTAACGA 920 

921 CTCCTTCAACTATTGGTCCGGTAACTATGTTTCCACCAGA 960 

45 ^ m . . 

961 CCAAGCATTGGATCTAATGACATCATCACATCTCCCTTCT 1000 
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1001 ATGGTAACAAGTCCAGTGAACCTGTGCAGAACCTTGAGTT .1040 

1041 C AACGG CG AG AAAG TC TAT AG AG CC GTCGC AAAC ACC AAT 1080 

■ • .• .. 

10 .. 1081 CTCGCTGTGTGGCCATCCGCAGTTTACTCAGGCGTCACAA 1120 

1121 AGGTGG AG TTT AG TC AGTATAACG ATCAG ACCG ATG AGGC 1160 

1161 CAGCACCCAGACTTACGACTCCAAACGTAACGTTGGCGCA^ 1200 

• ■ • * ' 

n 1201 GTCTCTTGGGATTCTATCGACCAATTGCCTCCAGAAACCA 1240 

1241 CAGACGAACCATTGGAGAAGGGCTACAGCCACCAACTTAA 1280 . 

' • • . • ■ * 

1281 CTATGTGATGTGCTTCTTG ATGC AAGGTTCCAGAGGGACC 1320.. 

1321 ATTCCAGTGTTGACCTGGACACACAAGTCCGTGGACTTCT. 1360. - 

1361 TCAACA7GATCGATAGCAAGAAGATCACTCAACTTCCCTT 1400 

35 1401 GGTGAAAGCCTACAAGCTGCAATCTGGTGCTTCCGTTGTC. 1440. . 

. • ' * 

1441. GCAGGTCCCAGATTCACTGGAGGTGACATCATCCAGTGCA . 1480 . 

40 • • 

1481 CAGAGAACGGCAGCGCAGCTACTATCTACGTGACACCTGA 1520 

1521 TGTGTCTTACTCTCAGAAGTACAGGGCACGTATTCATTAC 1560 

1561 GCATCTACCAGCCAGATCACCTTCACACTCAGCTTGGATG 1600 
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:. 1601 GAGCACCCTTCAACCAGTATTACTTTGACAAGACCATCAA 1640 

1641 CAAAGGTGACACTCTCACATACAATAGCTTC AACTTGGCA 1680 

. 1681 AGTTTCAGCACACCATTTGAACTCTCAGGCAACAATCTTC 1720 

• • • '. • 

1721 AGATCGGCGTCACCGGTCTCAGCGCCGGAGACAAAGTCTA 1760 

1761 CATCGACAAGATTGAGTTCATCCCAGTGAAC 1791. 
, A structural gene which encodes an Insectlcldal protein of B.t entomocldus having the sequence: 

1 ATGGAGGAGAACAACCAAAACCAATGCATTCCATACAACT . 40 

41 GCTTGAGTAACCC AG AAGAGGTATTGCTTGATGGAGAACG 8 0 

81 CATTTCAACCGGTAACTCTTCCATCGACATCTCCTTGTCC 120 

• • • * 

121 TTGGTCCAGTTTCTGGTCAGCAACTTCGTGCCAGGTGGTG 160 

161 GGTTCCTTGTCGGACTAATTGACTTCGTCTGGGGTATCGT 200 

201 TGGTCCATCTCAATGGGATGCATTCCTGGTGCAAATTGAG 240 

241 CAGTTGATCAACGAGAGGATCGCTGAGTTCGCCAGGAACG 280 

281 CTGCCATCGCTAACTTGGAAGGATTGGGCAATAACTTCAA 320 
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' 321 CATCTATGTGGAGGCCTTCAAAGAGTGGGAAGAGGACCCT 

3 61 AACAACCCAGAGACCCGCACTAGGGTGATCGACAGATTCA 

401 G AATCTT GG ACGG CCTC TTG GAG AG AG ATAT CCCATCCTT 

441 CAGAATCTCTGGCTTCGAAGTTCCTCTCTTGTCCGTGTAC 

481 GCTCAAGCAGCTAATCTTC ACC7CGCTATCCTTCGAGACA 

521 GTGTCATCTTTGGGGAAAGGTGGGGATTGACCACTATCAA 

5 61 CGTCAATGAGAATTACAACAGACTTATCAGGCACATTGAC 

601 GAGTACGCCGACCACTGTGCTAACACCTACAACCGTGGCT 

641 TGAACAATCTCCCTAAGTCTACTTATCAAGATTGGATTAC 

681 CTACAACAGGTTGAGGAGAGACTTGACCCTCACAGTTTTG- 

721 GACATTGCAGCTTTCTTCCCGAACTATGACAACAGGAGAT 

7 61 ACCCTATCCAACCAGTGGGTCAACTTACCAGAGAAGTCTA 

801 TAC TG AC CCAC XT ATC AAC TTC AAC CCTCAGTTGCAAAGT 

• • • • 

841 GTCGCCC AACTTCCC ACATTC AACGTC ATGG AGTCCAGCC 

881 GT AT C AGG AACC C AC ACTT GTT7G AC ATCTT G AACAACCT 
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• • . « • 

''921 TACTATCTTCACCGATTGGTTCAGCGTTGGGCGTAACTTC 960 

961 TATTGGGGTGGACACAGGGTCATCTCCTCTCTTATTGGAG 1000 

1001 GTGGGAACATTACCTCTCCTATCTATGGACGTGAGGCAAA 1040 

1041 C CAGG AG CCAC C AC GTAGTTT CACC TTCAAC GGTCC AGTC 1080 

1081 TTCAGAACCTTGTCTAACCCTACCTTGAGATTGCTCCAGC 1120 

1121 AACCTTGGCCAGCTCCACCTTTCAACCTTAGAGGTGTTGA 1160 

1161 GGGCGTTGAGTTCTCTACTCCTACCAACTCCTTCACTTAC 1200 

1201 AGAGGTAGAGGAACCGTTGATTCCTTGACCGAACTCCCAC 1240 

1241 CAGAGGACAATAGCGTGCCACCCAGGGAAGGCTACTCCCA 1280 

■ • • • 

1281 CAGGTTGTGCCACGCAACCTTCGTGCAGCGTTCCGGAACT 1320 

1321 C C ATTC CTCACTAC AG GAG TTGTGTTCTC AT GG ACTG ATC 1360 

1361 GTAGTGCTACTCTCACTAATACCATTGATCCCGAGAGGAT 1400 

1401 CAATCAAATCCC ATTGGTC AAGGGTTTCCGTGTGTGGGGA 1440 

• • • . • 

1441 G GAACTT CTGT C ATCAC AGG AC C AG GCTTCAC AGGAGGTG 1480 

1481 AT ATTCT TAG AAG AAAC ACTTTTGGC G ACTTT GTG AG CCT 1520 
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1521 CCAAGTTAACATCAACTCTCCAATTACTCAAAGATATCGT 1560 

1561 CTCAGGTTTCGTTACGCATCTTCCCGTGACGCTAGAGTCA 1600 

1601 TCGTGCTCACCGGAGCAGCTTCTACCGGTGTCGGTGGACA 1640 

1641 AGTCTCCGTGAAC ATGCCACTCC AG AAGACT ATGGAGATC : 1680 

1681 G GC G AGAACT TG AC ATCC AGG AC CTTC AG AT AC ACCG ACT 1720 

1721 TCTCTAACCCTTTCAGTTTCCGTGCCAACCCTGACATCAT- 1760 

• • • • 

17 61. TGGCATTAGCGAACAACCTCTCTTTGGAGCTGGTAGCATC 1800 

1801 TCATCTGGCGAATTGTACATTGAC AAG ATTG AGATCATTC 1840 

1841 TTGCCGACGCTACCTTCGAGGCTGAGTCTGACCTTGAGAG 1880 

■ •■ • • ' • • 

1881 AGCCCAGAAGGCTGTGAACGCCCTCTTTACCTCCTCTAAT 1920 

1921 C AG ATTGGCT7GAAAACTG ACGTTACTGACTATCACATTG 1 9 60 

1961 ACCAAGTGTCCAACTTGGTCGACTGCCTTAGCGATGAGTT 2000 

2001 CTGCCTCGACGAGAAGCGTGAACTCTCCGAGAAAGTTAAA 2040 

2041 CACGCCAAGCGTCTCAGCGACGAGAGGAATCTCTTGCAAG 2080 

2081 ACCCCAACTTCAGAGGCATCAACAGGCAGCCAGACCGTGG 2120 
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2121 TTGGAGAGGAAGCACCGACATCACCATCCAAGGAGGCGAC 21.60 

2161 GATGTGTTCAAGGAGAACTACGTCACCCTCCCAGGAACTG 2200; 

2201 TGGACGAGTGCTACCCTACCTACTTGTACCAGAAGATCGA 2240 

2241 TGAGTGCAAACTCAAAGCCTACACCAGGTATGAACTTAGA 2280 

2281 GGCTACATCGAAGACAGCCAAGACCTTGAAATCTACCTCA .2320 

■ • - . • 

2321 TCAGGTACAATGCCAAGCACGAGATCGTGAATGTCCCAGG .2360 

2361 TACTGGTTCCCTCTGGCCACTTTCTGCCCAAATGCCCATT. 2400 

2401 GGGAAGTGTGGAGAGCCTAACAGATGCGCTCCACACCTTG . 2440 

2441 AGTGG AATCCTGACTTGG ACTGCTCCTGCAGGG ATGGCGA 2480 

2481 GAAGTGTGCCCACCATTCTCATCACTTCACCTTGGACATC 2520 

2521 GATGTGGGATGTACTGACCTGAATGAGGACCTCGGAGTCT 2560 

2561 GGGTCATCTTCAAGATCAAGACCCAAGACGGACACGCAAG 2600 

• • • ■ • 

2601 ACTTGGCAACCTTGAGTTTCTCGAAGAGAAACCATTGCTC 2640 

• • • • 

2641 GGTGAAG CTCTCG CTCGTGTG AAG AGAGCAG AGAAGAAGT 2680 ' 

2681 GGAGGGACAAACGTGAGAAACTCCAACTCGAGACTAACAT 2720 
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'2721 CGTTTACAAGGAGGCCAAAGAGTCCGTGGATGCTTTGTTC 2760 

27 61 GTGAACTCCCAATATGATAGGTTGCAAGTGGACACCAACA 2800 

2901 TCGCCATGATCCACGCTGCAGACAAACGTGTGCACAGGAT 2840 
. * • •"• ' • 

2841 TCGTGAGGCTTACTTGCCTGAGTTGTCCGTGATCCCTGGT 2880 

#■ • • • 

2881 GTGAACGCTGCCATCTTCGAGGAACTTGAGGGACGTATCT 2920 

2921 TTACCGCATACTCCTTGTACGATGCCAGAAACGTCATCAA . 2960 

2961 GAACGGTGACTTCAACAATGGCCTCTTGTGCTGGAATGTG 3000 

3001 AAAGGTC ATGTGG ACGTGG AGGAAC AGAACAATCACCGTT 3040 

3041 CCGTCCTGGTTATCCCTGAGTGGGAAGCTGAAGTGTCCCA 3080 

3081 AGAGGTTAGAGTCTGTCCAGGTAGAGGCTACATTCTCCGT 3120 

3121 GTGACCGCTTACAAGGAGGGATACGGTGAGGGTTGCGTGA 31.60 

3161 CCATCCACGAGATCGAGGACAACACCGACGAGCTTAAGTT 3200 

3201 CTCCAACTGCGTCGAGGAAGAAGTCTATCCCAACAACACC. 3240 

3241 GTTACTTGCAACAACTACACTGGGACCCAGGAAGAGTACG 3280 

3281 AAGGTACCTACACTAGCCGTAACCAAGGTTACGACGAAGC . 3320 
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• • • . • 

3321 TTACGGAAACAATCCTTCCGTTCCTGCTGACTATGCCTCC 3360. 

3361 GTGTACGAGGAGAAATCCTACACAGATGGCAGACGTGAGA 3400 

• * • ■ •• 

3401 ACCCTTGCGAGTCCAACAGAGGTTACGGTGACTACACACC 3440 

3441 ACTTCCAGCAGGCTA7GTTACCAAGGACCTTGAGTACTTT 3480. 

3481 ' ■' CCTGAGACCGACAAAGTGTGGATCGAGATCGGTGAAACCG-.. 3320 

3521 AGGGAACCTTCATCGTGGACAGCGTGGAGCTTCTCTTGAT 3560 



3561 GGAGGAA 3567. 

■ ' .../-• 

J. A structural gene which encodes a P2 Insectlcldal protein having the sequence: 



1 ATGGACAACAACGTCTTGAACTCTGGTAGAACAACCATCT . 40 

41 GCGACGC ATACAACGTCGTGGCTCACGATCCATTCAGCTT 80 

81 CGAACACAAGAGCCTCGACACTATTCAGAAGGAGTGGATG .'. 120 

121 GAATGGAAACGTACTGACCACTCTCTCTACGTCGCACCTG 160 

• • • 

161 TGGTTGGAACAGTGTCCAGCTTCCTTCXCAAGAAGGTCGG 200 

201 CTCTCTCATCGGAAAACGTATCTTGTCCGAACTCTGGGGT 240. 
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" 241 ATCATCTTTCCATCTGGGTCCACTAATCTCATGCAAGACA .280 

281 T CTTGAGGG AG AC C G AAC AGTTTCTC AACC AGCGTCTCAA 320 

• • • . • . •• 

321 CACTGATACCTTGGCTAGAGTCAACGCTGAGTTGATCGGT 360 

• . • 

361 CTCCAAGCAAACATTCGTGAGTTCAACCAGCAAGTGGACA 400 

401 ACTTCTTGAATCCAACTCAGAATCCTGTGCCTCTTTCCAT 440 

441 CACTTCTTCCGTGAACAC7ATGCAGCAACTCT7CCTCAAC 480 

• • • . " 

481 AGATTGCCTCAGTTTCAGATTCAAGGCTACCAGTTGCTCC 520 

• • •■ . • 

521 TTCTTCCACTCTTTGCTCAGGCTGCCAACATGCACTTGTC 560 

' . . • . ♦ . • 

561 CTTCATACGTGACGTGATCCTCAACGCTGACGAATGGGGA 600. 

601 ATCTC TG C AG CC ACTCTT AGG ACAT AC AG AGACT ACTTGA 640 

641 GGAACTACACTCGTGATTACTCCAACTATTGCATCAACAC 680 

681 TTATCAGACTGCCTTTCGTGGACTCAATACTAGGCTTCAC 720 

721 GACATGCTTGAGTTCAGGACCTACATGTTCCTTAACGTGT 760 

•" • • • 

7 61 TTGAGTACGTCAGCATTTGGAGTCTCTTCAAGTACCAGAG 800 

801 CTTGATGGTGTCCTCTGGAGCCAATCTCTACGCCTCTGGC 840 
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841 AGTGGACCACAGCAAACTCAGAGCTTCACAGCTCAGAACT 880 

881 GGCCATTCTTGTATAGCTTGTTCCAAGTCAACTCCAACTA 920 

... 921 CATTCTCAGTGGTATCTCTGGGACCAGACTCTCCATAACC 960 
... • " • • ' 

961 TTTCCCAACATTGGTGGACTTCCAGGCTCCACTACAACCC 1000 

1001 ATAGCCTTAACTCTGCCAGAGTGAACTACAGTGGAGGTGT 1040 

• • •' • 

1041 CAGCTCT GG AT TG ATTG GTGCAACTAACTTGAACC AC AAC 1080 

• • • ■ • . 

1081 TTCAATTGCTCCACCGTCTTGCCACCTCTGAGCACACCGT 1120 

1121 TTGTGAGGTCCTGGCTTGACAGCGGTACTGATCGCGAAGG 1160 

1161 AGTTGCT ACCTCT ACAAACTGGCAAACCGAGTCCTTCCAA 1200 

1241 GGAATTCAAACTACTTTCCAGACTACTTCATTAGGAACAT 1280 

1281 CTCTGGTGTTCCTCTCGTCATCAGGAATGAAGACCTCACC 1320 

1321 CGTCCACTTCATTACAACCAGATTAGGAACATCGAGTCTC 1360 

• • • • ■ • ■ 

1361 CAT CCGG TACT CCAGGAGGTGCAAGAGCTTACCTCGTGTC 1400 

• • . • • • • 

1401 TGTCCATAACAGGAAGAACAACATCTACGCTGCCAACGAG 1440 
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1441 AATGGCACCATGATTCACCTTGCACCAGAAGATTACACTG 1480 

• •• " . . 

1481 GATTCACCATCTCTCCAATCCATGCTACCCAAGTGAACAA 1520 

■ • . .'• . ♦ 

1521 ■rCAGACACGCACCTTCATCTCCGAAAAGTTCGGAAATCAA 1560 . 

• .• • . • ■ 

1561 GGTGACTCCTTGAGGTTCGAGCAATCCAACACTACCGCTA 1600 

■ • ■ • ' ■ • ,•' 

1601 GGTACACTTTGAGAGGCAATGGAAACAGCTACAACCTTTA 1640 

1641 CTTGAGAGTTAGCTCC ATTGGTAACTCCACCATCCGTGTT 1680 

1631 ACCATCAACGGACGTGTTTACACAGTCTCTAATGTGAACA . 1720 

1721 CTACAACGAACAATGATGGCGTTAACGACAACGGAGCCAG . 1760 

1761 ATTCAGCGACATCAACATTGGCAACATCGTGGCCTCTGAC 1800 • 

1801 AACACTAACGTTACTTTGGACATCAATGTGACCCTCAATT 1840 • 

1841 CTGGAACTCCATTTGATCTCATGAACATCATGTTTGTGCC 1880 . 

1881 AACTAACCTCCCTCCATTGTAC 1902; OT 



K. A structural gene sequence encoding a. fusion protein comprising the N-termlnal 610 amino acids of B.tX 
HD-1 and the C-termlnal 567 amino acids of B.tX H D-73, said gene having the sequence: 
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• • • • • 

1 ATGGACAACAACCCAAACATCAACGAATGCATTCCATACA 40 

41 ACTGCTTGAGTAACCCAGAAGTTGAAGTACTTGGTGGAGA 80 

81 ACGCATTGAAACCGGT7ACACTCCCATCGACATCTCCTTG 120 

• -■■»'. •• • • • 

121 . TCCTTGACACAGTTTCTGCTCAGCGAGTTCGTGCCAGGTG 160 

161 CTGGGTTCGTTCTCGGACT AGTTGACATCATCTGGGGTAT 200 

201 CTTTGGTCCATCTCAATGGGATGCATTCCTGGTGCAAATT 240 

241 GAGCAGTTGATCAACCAGAGGATCGAAGAGTTCGCCAGGA 280 

281 ACCAGGCCATCTCTAGGTTGGAAGGATTGAGCAATCTCTA 320 

321 CCAAATCTATGCAGAGAGCTTCAGAGAGTGGGAAGCCGAT 360 

361 CCTACTAACCCAGCTCTCCGCGAGGAAATGCGTATTCAAT 400 

. • • • • •. 

401 TCAACGACATGAACAGCGCCTTGACCACAGCTATCCCATT 440 
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'441 G T T C GC AG TCC AG AACTACC AAGTTCCTCTC TTGTCC GTG . 4 80 

481 TACGTTCAAGCAGCTAATCTTCACCTCAGCGTGCTTCGAG S20 

521 ACGTTAGCGTGTTTGGGCAAAGGTGGGGATTCGATGCTGC ■. . 5 60 

• • • 

561 AAC CATC AAT AG C CGTT ACAAC GAC C TTACT AGG CTGATT 600 

601 GGAAACT ACACCGACCACGCTGTTCGTTGGT ACAACACTG . . 640 . 

641 GC XT GGAGCGTGTCTGGGGTCCTGATTCTAG AG ATTGGAT 680. 

681 TAGATACAACCAGTTCAGGAGAGAATTGACCCTCACAGTT 720 

721 TTGGACATTGTGTCTCTCTTCCCGAACTATGACTCCAGAA 760 

761 CCTACCCTATCC67ACAGTGTCCCAACTTACCAGAGAAAT 800 

801 CTATACTAACCCAGTTCTTGAGAACTTCGACGGTAGCTTC 840 

841 CGTGGTTCTGCCCAAGGTATCGAAGGCTCCATCAGGAGCC 880 

881 C AC ACTT GAT GG ACATCTTG AAC AGC ATAACTATC TACAC 920 

921 CGATGCTCACAGAGGAGAGTATTACTGGTCTGGACACCAG .960 

961 ATCATGGCCTCTCC AGTTGGATTCAGCGGGCCCGAGTTTA 1000 

1001 CCTTTCCTCTCTATGGAACTATGGGAAACGCCGCTCCACA 1040 
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1041 ACAACGTATCGTTGCTCAACTAGGTCAGGGTGTCTACAGA 1080 

1081 ACCTTGTCTTCCACCTTGTACAGAAGACCCTTCAATATCG 1120 

10 ... 1121 GTAT C AAC AAC C AGCAACTTTC CGT TCTTGACGGAAC AGA 1160 

1161 GTTCGCCTATGGAACCTCTTCTAACTTGCCATCCGCTGTT 1200 

■ 15 ■ •..'*•:". . '* ■ ; . 

1201 TACAGAAAGAGCGGAACCGTTGATTCCTTGGACGAAATCC 1240 

20 .. 1241 CACCACAGAACAACAATGTGCCACCCAGGCAAGGATTCTC 1280 

1281 CCACAGGTTGAGCCACGTGTCCATGTTCCGTTCCGGATTC 1320 

25 • * 

1321 AGCAACAGTTCCGTGAGCATCATCAGAGCTCCTATGXTCT . 1360 

13 61 CATGGATTCATCGTAGTGCTGAGTTCAACAATATCATTCC. 1400 

■ 30 

• . •■ • • 

1401 TTCCTCTCAAATCACCCAAATCCCATTGACCAAGTCTACX 1440 

35 1441 AACCTTG GATCTG G AACTTCTGTC GTG AAAGG ACCAGGCT 1480 

1481 TCACAGGAGGTGATATTCTTAGAAGAACTTCTCCTGGCCA 1520 

1521 GATTAGCACCCTCAGAGTTAACATCACTGCACCACTTTCT 1560 . 

46 15 61 CAAAGATATCGTGTCAGGATTCGTTACGCATCTACCACTA 1600 

1601 ACTTGCAATTCCACACCTCCATCGACGGAAGGCCTATCAA 1640 

so 

«s. ■ . -. 
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1641 TCAGGGTAACTTCTCCGCAACCATGTCAAGCGGCAGCAAC. 1.680 

1681 TTGCAATCCGGCAGCTTCAGAACCGTCGGTTTCACTACTC 1720. 

1721 CTTTCAACTTCTCTAACGGATCAAGCGTTTTCACCCTTAG . 1760 

17 61 CGCTCATGTGTTCAATTCTGGCAATGAAGTGTACATTGAC 1800 

1801 CGTATTGAGTTTGTGCCTGCCGAAGTTACCCTCGAGGCTG 1840 /. 

1841 AGTACAACCTTGAGAGAGCCCAGAAGGCTGTGAACGCCCT 1880 

1881 CTTT ACCTCCACC AATCAGCTTGGCTTGAAAACTAACGTT 1920 

• * •• • 

1921 ACTGACTATCACATTGACCAAGTGTCCAACTTGGTCACCT. I960 ' 

• . • • • • • 

,1961 ACCTTAGCGATGAGTTCTGCCTCGACGAGAAGCGTGAACT 2000 

2001 ' CTCCGAGAAAGTTAAACACGCCAAGCGTCTCAGCGACGAG 2040 

2041 AGGAATCTCTTGCAAGACTCCAACTTCAAAGACATCAACA 2080 

2081 GGCAGCCAGAACGTGGTTGGGGTGGAAGCACCGGGATCAC 2120. . 

2121 C ATCC AAGGAGGC GACGATGTGTTCAAGGAGAACTACGTC 2160 

• • • • 

2161 ACCCTCTCCGGAACTTTCGACGAGTGCTACCCTACCTACT 2200 

2201 TGTACC AG AAG AT C G ATG AGTCC AAACT CAAAG CCTTCAC 2240 
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2241 C AGGTAT C AACTT AG AGGCTAC ATCG AAG AC AG CCAAGAC 2280 

• • • • . . 

2281 C TT GAAATCTACT C G ATCAGGTAC AATGCC AAGC ACG AGA 2320 

2321 CCGTGAATGTCCCAGGTACTGGTTCCCTCTGGCCACTTTC 2360 

2361 TGCCCAATCTCCCATTGGGAAGTGTGGAGAGCCTAACAGA 2400 

2401 TGCGCTCCACACCTTGAGTGGAATCCTGACTTGGACTGCT 2440 

2441 CCTGCAGGGATGGCGAGAAGTGTGCCCACCATTCTCATCA 2480 

2481 CTTCTCCTTGGACATCGATGTGGGATGTACTGACCTGAAT 2520 

.2521 GAGGACCTCGGAGTCTGGGTCATCTTCAAGATCAAGACCC 2560 

2561 AAGACGGACACGCAAGACTTGGCAACCTTGAGTTTCTCGA 2600 

2601 AGAGAAACCATTGGTCGGTGAAGCTCTCGCTCGTGTGAAG 2640 

2641 AGAGCAGAGAAGAAGTGGAGGGACAAACGTGAGAAACTCG 2680 

2681 AAT GGGAAAC TAAC ATCGT T T AC AAGG AGGC C AAAG AGTC 2720 

2721 CGTGGATGCTTTGTTCGTGAACTCCCAATATGATCAGTTG 2760 

2761 CAAGCCGACACCAACATCGCCATGATCCACGCCGCAGACA 2800. . 

2801 AACGTGTGCACAGCATTCGTGAGGCTTACTTGCCTGAGTT 2840 
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2841 GTCCGTGATCCCTGGTGTGAACGCTGCCATCTTCGAGGAA. 2880 

2881 CTTGAGGGACGTATCTTTACCGCATTCTCCTTGTACGATG 2920 . 

10 2921. C CAG AAACGTCA T CAAGAACGGTGACTTCAAC AATGGCCT. . 2960 , 

2 961 CAGCTGCTGGAATGTGAAAGGTCATGTGGACGTGGAGGAA 3000 

15 .'.•/'.., ■' • ' 

3001 CAGAACAATCAGCGTTCCGTCCTGGTTGTGCCTGAGTGGG-. 3040 • 

20 3041 AAGCTGAAGTGTCCCAAGAGG7TAGAGTCTGTCCAGGTAG 3080 

3081 AGGCTACATTCTCCGTGTGACCGCTTACAAGGAGGGATAC 3120 

3121 GGTGAGGGTTGCGTGACCAXCCACGAGATCGAGAACAACA 3160 

3161 CCGACGAGCTTAAGTTCTCCAACTGCGTCGAGGAAGAAAT 3200 

30 

3201 C TAT C C C AAC AAC ACCGTT AC7TGC AACG ACT ACACTGTG 3240 

35 3241 AATCAGGAAGAGTACGGAGGTGCCTACACTAGCCGTAACA 3280 

3281 GAGGTTACAACGAAGCTCCTTCCGTTCCTGCTGACTATGC 3320 

3321 CTCCGTGTACGAGGAGAAATCCTACACAGATGGCAGACGT 3360 

45 3361 G AG AAC C CT TGCG AGTTC AAC AG AG GTTAC AGGGACT ACA 3400 

3401 CACCACTTCCAGTTGGCTATGTTACCAAGGAGCTTGAGTA 3440 
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3441 CTTTCCTGAGACCGACAAAGTGTGGATCGAGATCGGTGAA 3480 
3481 ACCGAGGGAACCTTCATCGTGGACAGCGTGGAGCTTCTCT 3520 
3521 TGATGGAGGAA 3531, 



Paten tansprOche 

1. Verfahren zur Modifizierung einer Wildtyp-Struktur-Gensequenz, welche fOr ein Insektizides Protein von Bacillus 
■ . thuhngiensis codiert, zur Verbesserung der Expression dieses Proteins in Pflanzen, welches umfasst: 

a) das Identifizieren von Regionen Innerhalb dieser Sequenz mtt mehr als vler aufeinander folgenden Adenin- 
oder Thymin-Nukleotiden; 

b) das Modifizleren der Regionen von Schritt (a), die zwei Oder mehr Poryadenylierungsslgnale innerhalb einer 
Zehn-Basen-Sequenz aufweisen, um diese Signale zu entfernen, wobei eine Gensequenz, die fflr dieses 
Protein codiert, belbehaiten wlrd; und 

c) das Modifizleren der 15-30-Basen-Regionen, die die Regionen von Schritt (a) umgeben, um Pflanzen-Po- 
lyadenyllerungs-Hauptsignale, aufeinander folgende Sequenzen, die mehr als eln untergeordnetes Polyade- 
nylierungsslgnal enthalten, und aufeinander folgende Sequenzen, die mehr als eine ATTTA-Sequenz enthal- 
ten, zu entfernen, wobei eine Gensequenz, die fur dieses Protein codiert, beibehalten wlrd. 

2. Verfahren zur Modifizierung einer Wildtyp-Struktur-Gensequenz, welche fur ein insektizides Protein von Bacillus 
thuringiensis codiert, zur Verbesserung der Expression dieses Proteins In Pflanzen, welches umfasst: 

a) das Entfernen von Polyadenylierungsslgnalen, die in diesem Wlldtyp-Gen enthalten sind, wobei eine Se- 
quenz, die fOr dieses Protein codiert, belbehaiten wlrd; und 

b) das Entfernen von ATTTA-Sequenzen, die in diesem Wildtyp-Gen enthalten sind, wobei eine Sequenz, die 
40 fur dieses Protein codiert, beibehalten wlrd. 

3. Verfahren nach Anspruch 2, welches weiters das Entfernen von selbstkomplementaren Sequenzen und das Er- . 
setzen solcher Sequenzen durch nicht-selbstkomplementare DNA, welche von Pflanzen bevorzugte Codons auf- 
welst, wobei eine Struktur-Gensequenz, die fOr dieses Protein codiert, belbehaiten wlrd. 

45 

4. Verfahren nach den Anspruchen 1 bis 3, welches welters die Verwendung von von Pflanzen bevorzugten Sequen- 
zen beim Entfernen der Polyadenylierungssignale und ATTTA-Sequenzen umfasst. 

5. Verfahren nach den Anspruchen 1 bis 3, bel welchem die Pflanzen-Polyadenyllerungsslgnale ausgewahlt sind aus 
SO der Gruppe bestehend aus AATAAA, AATAAT, AACCAA, ATATAA, AATCAA, ATACTA, ATAAAA, ATGAAA, AAG- 

CAT, ATTAAT, ATACAT, AAAATA, ATTAAA, AATTAA, AATACA und CATAAA. 

6. Verfahren zur Verbesserung der Expression elnes heterologen Gens In Pflanzen, wobei dieses Gen ein modifl- 
zlertes chimSres Gen aufwelst, das einen Promoter enthalt, der In Pflanzenzellen wlrkt, der operabel mit einer 

55 struktureilen Codiersequenz und einer 3'-nlcht-trans!atlerten Region, die ein Polyadenyllerungsslgnal enthalt, das 

In Pflanzen wlrkt, um die Addition von Polyadenylat-Nukleotlden an das 3'-Ende der RNA zu bewirken, verbunden 
1st, wobei die strukturelle Codiersequenz f Or ein Insektizides Protein codiert, von welchem mlndestens eln Teil von 
einem Bacillus-thuringiensis-Protem stammte, wobei das Verfahren das Modifizieren dleser struktureilen Codler- 
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sequenz umfasst, so dass diese Sequenz eine DNA-Sequenz aufweist, die sich von der natOrlicherweise vorkom- 
menden DNA-Sequenz, welche fur dieses Bacillus-thuringiensis- Protein codiert, unterscheidet und diese struktu- 
relle Codiersequenz nicht mehr als 5 aufeinander folgende Nukleotide aufweist, die entweder aus Adenln- oder 
aus Thymln-Resten bestehen. 

s . "' 

7. Verfahren zur Verbesserung der Expression eines heterologen Gens in Pflanzen, wobei dieses Gen eln modlfiV 
zlertes chlmares Gen aufweist, das einen Promotor enthalt, der in Pflanzenzeilen wirkt, der operabel mlt einer . 
strukturellen Codiersequenz und einer 3'-nlcht-translatlerten Region, die eln Polyadenylierungssignai enthalt, das 
In Pflanzen wirkt, urn die Addition von Polyadenylat-Nukleotiden an das 3*-Ende der RNA zu bewlrken, verbunden . 

10 1st, wobei diese strukturelle Codiersequenz fur ein Insektizides Protein codiert, von weichem mlndestens eln Tell 

von einem Bac///us-fhurino/ens/s-Protein stammte, wobei das Verfahren das Modiflzieren dieser strukturellen Co- ■■■ 
diersequenz umfasst, so dass diese Sequenz eine DNA-Sequenz besitzt, die sich von der natOrlicherweise vbr- 
kommenden DNA-Sequenz, die fur das Bacillus-thuringiensis-Pro\.e)n codiert, unterscheidet und die folgenden 
Merkmalehat: 

is diese strukturelle Codiersequenz hat eine Region, die zur folgenden Sequenz komplementar 1st 

GGCTTGATTCCTAGCGAA.CTCTTCGATTCTCTGGTTGATGAGCTGTTC . 
20 15 10 15 20 25 30 35 40 45 

wobei in der Codiersequenz dieser Region 2 AACCAA- und 1 AATTAA-Sequenz ellminiert slnd. 

25 a. Verfahren nach Anspruch 7, wobei die strukturelle Codiersequenz fOr eln Insektizides Protein codiert, von weichem 
mindestens eln Tell von einem Bacillus thuringlensls kurstakis HD^ stammte. 

9. Verfahren nach Anspruch 7 oder 8, wobei die Pflanze eine Tabakpflanze 1st 

so 10. Modifiziertes chimares Gen, das einen Promotor enthalt, welcher In Pflanzenzeilen wirkt, der operabel mlt einer 
strukturellen Codiersequenz und einer 3'-nicht-transIatierten Region, die ein Polyadenylierungssignai enthalt, wel- 
ches in Pflanzen wirkt, urn die Addition von Poly adenylat-Nukleotiden am 3'-Ende der RNA zu bewlrken, verbunden 
ist, wobei diese strukturelle Codiersequenz fur eln Insektizides Protein codiert, von weichem mindestens eln Tell 
von einem Bacillus thuringiensis-P rotein stammt, wobei diese strukturelle Codiersequenz eine DNA-Sequenz auf- 

35 weist, die sich von der natOrlicherweise vorkommenden DNA-Sequenz, welche fOr dieses Bacillus thuringiensls- 

Protein codiert, unterscheidet und ausgewahlt Ist aus: 



A. einem Struktur<5en, welches fDr eln insektizides Protein von B.tk. HD-1 codiert, mit der Sequenz: 
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1 ATGGCTATAGAAACTGGTTACACCCC^TCGATATTTCCT 40 

41 TGTCGCTAACGCAATTTCTTrTGAGTGAATTTGTTCCCGG , 80 

81 T GCTGG ATTTGTGTT AGGACTAGTTG ATAIT ATCTG GGGA 120 

121 ATTTTTGGTCCCTCTCAATGGGACGCATTTCTTGTACAAA 160 

161 TTGAACAGCTCATCAACCAGAGAATCGAAGAGTTCGCTAG 200 

201 GAATCAAGCCATTTCTAGATTAGAAGGACTAAGCAATCTT 240 

241 TATCAAATTTACGCAGAATCTTTTAGAGAGTGGGAAGCAG 280 

281 ATCCTACTAATCCAGCATTAAGAGAAGAGATGCGTATTCA 320 

321 ATTC AAT GACATG AACAGTGCC CTTACAAC CGCTATTCCT 360 

• • • «. 

4 01 TGTACGTTCAAGCTGCCAACCTCCACCTCTCAGTTTTGAG 440 

441 AGATGTTTCAGTGTTTGGACAAAGGTGGGGATTTGATGCC 480 

481 G CG ACT ATC AATAGTC GTTATAATGATTTAACTAGGCTTA 520 

521 TT GGC AACT AT AC AG AT CATGCTGTACGCTGGTACAATAG 560 

• « • • 

5 61 GGG ATTAGAGCGTGTATGGGG ACCGGATTCTAGAGATTGG 600 

601 AT C AGGT AC AAC C AGTT CAG AAGAG AGCTT ACACTAACTG 640 

641 TATTAGATATCGTTTCTCTATTTCCGAACTATGATAGTAfi 680 

681 AACGTATCCAATTCGAACAGTTTCCCAATTAACAAGAGAA 720 
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721 ATTTATACAAACC C AGT ATTAG AAAATTTT GATGGTAGTT 760 

761 TTCGAGGCTCGGCTCAGGGCATAGAAGGAAGTATTAGGAG 800 

801 TCC ACATTTG ATG G ATATACTT AATAGTAT AACC ATCTAT 840 

841 ACGGATGCTCATAGAGGAGAATACTACTGGTCCGGTCACC 880 

881 AGATCATGGCTTCTCCTGTAGGGTTTTCGGGGCCAGAATT .920 

921 C ACTTTT CC GCTAT ATGG AACTATGGG AAATGCAGCTCCA 960 

961 CAACAACGTATTGTTGCTCAACTAGGTCAGGGCGTGTATA 1000 

• ■• ' * . . ■ . • 

1001 GAACATTATCGTCCACCTTATATAGAAGACCTTTTAACAT. 1040 

1041 CGGGATCAACAACCAACAACTATCTGTTCTTGACGGGACA 1080 

1081 GAATTTGCTTATGGAACCTCCTCAAATTTGCCATCCGCTG 1120 

1121 TATACAGAAAAAGCGGAACGGTAG ATTCGCTGGATGAAAT 1160 

1161 ACCGCCACAGAATAACAACGTGCCACCTAGGCAAGGATTT 1200 

1201 AGTCATCGATTAAGCCATGTTTCAATGTTTCGTTCAGGCT 1240 

• • » • 

1241 TTAGTAATAGTAGTGTAAGTATAATAAGAGCTCCTATGTT 1280 

1281 CT CTTGGATAC AT CGTAGT GCTG AGTTC AAC AAC AT CATC 1320 

1321 CCTTCATCACAAATCACCCAAATCCCACTCACCAAGTCTA 1360 

1361 CTAATCTTGGCTCTGGAACTTCTGTCGTTAAAGGACCAGG 1400 

1401 ATTT ACAGGAGG AG AT ATT CTTCG AAGAAC TTCACCTGGC 1440 
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1441 CAGATTTCAACCTTAAGAGTAAATATTACTGCACCATTAT 1480 
1481 CACAAAGATATCGGGTAAGAATTCGCTACGCTTCTACCAC 1520 
1521 AAAC CTTC AGTTC C ACACATCAATTG ACGG AAG ACCTATT 1560 

• "'. •.; 

1561 AATC AG GGGAATTTTTCAG C AACT ATGAGT AGTGGG AGTA 1600 

1601 ATTTACAGTCCGGAWSCTTTAGGACTGTAGGTTTTACTAC 1640 

1641 T CCGTTTAACTTTTCAAATGGATC AAGTGT ATTTACGTTA 1680 

... ' • 

1681 AGTGCTCATGTCTTCAATTCAGGCAATGAAGTTTArTATAG 1720 

• " * 

1721 ATCGAATTGAATTTGTOXGGCA 1743^ 

B. einem Struktur-Gen, welches fur ein Insektizldes Protein von B.t.k. HD-73 codiert, mlt der Sequenz: 

1 ATGGCCATTGAAACCGGTTACACTCCCATCGACATCTCCT 40 

• • • . . •■• 

41 TGTCCTTGACACAGTTTCTGCTCAGCGAGTTCGTGCCAGG 80 

• • ■ • ■ 
81 TGCTGGGTTCGTTCTCGGACTAGTTGACATCATCTGGGGT 120 

• V • 

' 121 ATCTTTGGTCCATCTCAATGGGATGCATTCCTGGTGCAAA 160 

■ •. • • 

161 TTGAGCAGTTGATCAACCAGAGGATCGAAGAGTTCGCCAG 200 

• • • • 

201 G AAC C AGGCC AT CTCT AGGTTGG AAGG ATTG AGCAATCTC 240 

241 TACCAAATCTATGCAGAGAGCTTCAGAGAGTGGGAAGCCG 280 

• • • ■ 

281 ATCCTACTAACCCAGCTCTCCGCGAGGAAATGCGTATTCA 320 



95 



EP0 385962B1 

321 ATTCAACGACATGAACAGCGCCTTGACCACAGCTATCCCA .. 360 
361 TTGTTCGCAGTCCAGAACTACCAAGTTCCTCTCTTGTCCG . 400 

401 T GTACGTTC AAGC AGCT AATCTT C ACCTCAGCGTGCTTCG 440 

• ■„' •• 

441 AGACGTTAGCGTGTTTGGGCAAAGGTGGGGATTCGATGCT 480 

• * ' • 

481 GCAACCATCAATAGCCGTTACAAC GACCTT ACTAGGCTGA 520. • 

521 TTGGAAACTACAC CG ACCACGCTGTTCGTTGGT ACAACAC 560. 

5 SI TGGCTTGGAGCGTGTCTGGGGTCCTGATTCTAGAGATTGG . 600 

601 ATTAG AT AC AACC AGTTCAGGAGAGAATTGACCCTCACAG 640 

• • '.»•.. • • " ' • ' 

• ■ • •♦ • • . 

681 AACCT ACCCTATC CGTAC AGTGTC CCAACTTAC CAGAGAA 720 

721 ATCTATACTAACCCAGTTCTTGAGAACTTCGACGGTAGCT 760 

761 TCCGTGGTTCTGCCCAAGGTATCGAAGGCTCCATCAGGAG 800 . 

801 CCCACACTTGATGGACATCTTGAACAGCATAACTATCTAC 840 

841 ACCGATGCTCACAGAGGAGAGTATTACTGGTCTGGACACC 880 

881 AGATCATGGCCTCTCCAGTTGGATTCAGCGGGCCCGAGTT 920 . 

921 TACCTTTCCTCTCTATGGAACTATGGGAAACGCCGCTCCA 960 

961 CAACAACGTATCGTTGCTCAACTAGGTCAGGGTGTCTACA 1000 

1001 GAACCTTGTCTTCCACCTTGTACAGAAGACCCTTCAATAT 1040 
1041 CGGTATCAACAACCAGCAACTTTCCGTTCTTGACGGAACA 1080 
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1081 GAGTTCGCCTATGGAACCTCTTCTAACTTGCCATCCGCTG 

• • • • 
1121 TTTACAGAAAGAGCGGAACCGTTGATTCCTTGGACGAAAX 

1161 CCCACCACAGAACAACAATGTGCCACCCAGGCAAGGArrC 

"• • •' * . ' • 

1201 TCCCACAGGTTGAGCCACGTGTCCATGTTCCGTTCCGGAT 

• • • .. * 
1241 TCAGCAACAGTTC CGTGAGCATCATCAGAGCTCCTATGTT 

• • • 
1281 CTCTTGGATACACCGTAGTGCTGAGTTCAACAACATCATC 

1321 GCATCCGATAGTATTACTCAAATCCCTGCAGTGAAGGGAA, 

1361 ACTTTCTCTTCAACGGTTCTGTCATTTCAGGACCAGGATT 

1401 CACTGGTGG AGACCTCGTTAGACTCAACAGCAGTGGAAAT 

1441 AACATTCAGAATAGAGGGTATATTGAAGTTCCAATTCACT 1480 

• ■«.-■'•■ • 

1481 TCCCATCCACATCTACCAGATATAGAGTTCGTGTGAGGTA 1520 

1521 TGCTTCTGTGACCCCTATTCACCTCAACGTTAATTGGGGT 1560 

• ■ • • • 

1561 AATTCATCCATCTTCTCCAATACAGTTCCAGCTACAGCTA 1600 

1601 CCTCCTTGGATAATCTCCAATCCAGCGATTTCGGTTACTT 1640 

1641 TGAAAGTGCCAATGCTTTTAGATCTTCACTCGGTAACAIC 1680 

1661 GTGGGTGTTAGAAACTTTAGTGGGACTGCAGGAGTGATTA 1720 

1721 TCGACAGATTCGAGTTCATTCCAGTTACTGCAACACTCGA 1760 

1761 GGCTGAG 1767. 



C. einem Struktur-Gen, das fur ein insektizides Protein von B.tk. HD-1 codlert, mft der Sequenz: 
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. 1 ATGGAC AACAACC C AAACATCAACG AATG C ATTC CATACA 40, 

4 1 ACTGCTTGAGTAACCCAGAAGTTGAAGTACTT GGTGGAGA 80 

81 ACGCATTGAAACCGGTTACACTCCCATCGACATCTCCTTG 120 

121 TCCTTGACACAGTTTCTGCTCAGCGAGTTCGTGCCAGGTG 160 

161 CTGGGTTCGTTCTCGGACTAGTTGACATCATCTGGGGTAT 200 

201 CTTTGGTCCATCTCAATGGGATGCATTCCTGGTGCAAATT 240. 

241 GAGCAGTTGATCAACCAGAGGATCGAAGAGTTCGCCAGGA 280. 

• ■• ' • . . .■• ■ 

281 ACCAGGCCATCTCTAGGTTGGAAGGATTGAGCAATCTCTA .320 

• . • #• 

321 CCAAATCTATGCAGAGAGCTTCAGAGAGTGGGAAGCCGAT 360 

' 3 61 CCTACTAACCCAGCTCTCCGCGAGGAAATGCGTATTCAAT 400 

401 TCAACGACATGAACAGCGCCTTGACCACAGCTATCCCATT 440 

• ■ • • 

441 GTTCGCAGTCCAGAACTACCAAGTTCCTCTCTTGTCCGTG . 480 

481 TACGTTCAAGCAGCTAATCTTCACCTCAGCGTGCTTCGAG 520 

521 ACGTTAGCGTGTTTGGGCAAAGGTGGGGATTCGATGCTGC 560 

561 AACC ATC AAT AG C CGTT AC AAC G AC CTTACT AGG CTGATT 600 

601 GGAAACTACACCGACCACGCTGTTCGTTGGTACAACACTG 640 

641 GCTTGGAGCGTGTCTGGGGTCCTGATTCTAGAGATTGGAT 680 
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681 T AG AT AC AACC AG TT CAGG AG AGAATTGACC CTCACAGTT 

721 . TTGGACATTGTGTCTCTCTTCCCGAACTATGACTCCAGAA 

7 61 CCTACCCTATCCGTACAGTGTCCCAACTTACCAGAGAAAT 
801 CTATACTAACCCAGTTCTTGAGAACTTCGACGGTAGCTTC 

841 CGTGGTTCTGCCCAAGGTATCGAAGGCTCCATCAGGAGCC 

• • • • 

8 81 CACACTTGATGG ACATCTTGAACAGCATAACTATCTACAC 

921 CGATGCTCACAGAGGAGAGTATTACTGGTCTGGACACCAG 

'961 ATCATGGCCTCTC CAGTTG G ATTCAGCGGGCCCGAGT TTA 

1001 CCTTTCCTCTCTATGGAACTATGGGAAACGCCGCTCCACA 

1041 ACAACGTATCGTTGCTCAACTAGCTCAGGGTGTCTACAGA 

1081 AC CTTGT CTTCCAC CTTGTACAGAAGACCCTTCAATATCG 

1121 GTATCAACAACCAGCAACTTTCCGTTCTTGACGGAACAGA 

11 61 GTTCGCCTATGGAACCTCTTCTAACTTGCCATCCGCTGTT 

1201 TACAGAAAGAGCGGAACCGTTGATTCCTTGGACGAAATCC 

» 

12 41 CACCACAGAACAACAATGTGCCACCCAGGCAAGGATTCTC 

1281 CC ACAGGTTG AGC CAC GTGTCCATG TTCCGTTCCGGATTC 

• • • • 

13 21 AGCAACAGTTCCGTGAGCATCATCAGAGCTCCTATGTTCT 

1361 CATGGATTCATCGTAGTGCTGAGTTCAACAATATCATTCC 
1401 TTCCrCTCAAATCACCCAAATCCCATTGACCAAGTCTACT 
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• » . • •' 

1441 AACCTTG G ATCTGG AACTT CTGTCGTGAAAGG ACCAGGCT 1480 _ 

1481 TCACAGGAGGTGATATTCTTAGAAGAACTTCTCCTGGCCA . 1520 

1521 • GATTAGCACCCTCAGAGTTAACATCACTGCACCACTTTCT 1560 

: 1561 CAAAGATATCGTGTCAGGATTCGTTACGCATCTACCACTA , 1600 

• • • • 

1601 ACTTGCAATTCCACACCTCCATCGACGGAAGGCCTATCAA 1640 

1641 TCAGGGTAACrrCTCCGCAACCATGTCAAGCGGCAGCAAC 1680 

1681 TTGCAATCCGGCAGCTTCAGAACCGTCGGTTTCACTACTC 1720 

1721 CTTTCAACTTCTCTAACGGATCAAGCGTTTTCACCCTTAG 1760 

1761 CGCTCATGTGTTCAATTCTGGCAATGAAGTGTACATTGAC 1800 

1801 CGTATTGAGTTTGTGCCTGCCGAAGTTACCTXCGAGGCTG 1840 
1841 AGTAC 184S. 

D. elnem Struktur-Gen, das fur ein Insektlzldes Protein codiert, das von B.tfc HD-73 stammt, mitder Sequenz: 

• «... • • 

1 ATGG AC AAC AACC C AAAC ATC AAC G AATG CATTCCAT ACA 40 

• ■ . . • • 

41 ACTGCTTGAGTAACCCAGAAGTTGAAGTACTTGGTGGAGA 80 

81 ACGCATTGAAACCGGTTACACTCCCATCGACATCTCCTTG 120 

121 TCCTTGACACAGTTTCTGCTCAGCGAGTTCGTGCCAGGTG 160 

161 CTGGGTTCGTTCTCGGACTAGTTGACATCATCTGGGGTAT 200 
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• • • • 

' 201 CTTTGGTCCATCTCAATGGGATGCATTCCTGGTGCAAAIT 240 

241 GAGCAGTTGATCAACCAGAGGATCGAAGAGTirGCCAGGA 280 

281 ACCAGGCCATCTCTAGGTTGGAAGGATTGAGCAATCTCTA 320 

321 CCAAATCTATGCAGAGAGCT7CAGAGAGTGGGAAGCCGA7 360 

3 61 CCTACTAACCCAGCTCTCCGCGAGGAAATGCGTATTCAAT 400.: 

■ ■ • • • * • •'.*-■■ 

401 TCAACGACATGAACAGCGCCTTGACCACAGCTATCCCATT 440 

441 GTTCGCAGTCCAGAACTACCAAGTTCCTCTCTTGTCCGTG 480 

■ ■ ' . • 

481 TACGTTCAAGCAGCTAATCTTCACCTCAGCGTGCTTCGAG 520 

521 ACGTTAGCGTGHTTGGGCAAAGCaTGGGGATTCGATGCTGC 560; 

561 AACCATCAATAGCCGTTACAACGACCTTACTAGGCTGATT 600 

601 GGAAACTACACCGACCACGCTGTTCGTTGGTACAACACTG 640 

64 i ■ GCTTGGAGCGTGTCTGGGGTCCTGATTCTAGAGATTGGA? 680 

681 TAGATACAACCAGTTCAGGAGAGAATTGACCCTCACAGTT 720 

721 TTGGACATTGTGTCTCTCTTCCC6AACTATGACTCCAGAA 760 

761 CCTACCCTATCCGTACAGTGTCCCAACTTACCAGAGAAAT 800 

" 8 01 CTATACTAACCCAGTTCTTGAGAACTTC6ACGGTAGCTTC 840 

• 'a • • ■ ■ 

841 C GTGGTTCTG C CCAAGGTATCGAAGGCTCC ATCAGG AGCC 880 

• • • . • 

881 C AC ACTT GAT G GACATCTT GAACAGCAT AACT ATCT ACAC . 920 

921 C G ATG CTC ACAGAGG AG AGTATT ACTGGTCTGG ACACCAG 960 
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961 ATCATGGCCTCTCCAGTTGGATTCAGCGGGCCCGAGTTTA 

1001 CCTTTCCTCTCTATGGAACTATGGGAAACGCCGCTCCACA 

.1041 ACAACGTATCGTTGCTCAACTAGGTCAGGGTGTCTACAGA 

1081 ACCTTGTCTTCCACCTTGTACAGAAGACCCTTCAATATCG 

1121 GT ATC AAC AACCAGC AACT TTCCG TTCTTGACGGAAC AGA 
»■ • • ♦ 

1161 GTTCGCCTATGGAACCTCTTCTAACTTGCCATCCGCTGTT 

1201 TACAGAAAGAGCGGAACCGTTGATTCCTTGGACGAAATCC 

1241 CACCACAGAACAACAATGT GCC ACCCAGGCAAGGATTCTC 

12 8 1 CCACAGG TTGAGC CACGTGTC C ATGTTCCGT TCCGGATTC 

1321 AGCAACAGTTCCGTGAGCATCATCAGAGCTCCTATGTTCT 

1361 CTTGG ATACACCGTAGTGCTGAGTTC AACAACATCATCGC 

14 01 ATCC GAT AGTATT ACTCAAATC CCT GCAGTG AAGGGAAAC . 

14 41 TTTCTCTTCAACGGTTCTGTCATTTCAGGACCAGGATTCA 
1481 CTGG TG GAGACCT CGTTAGACTCAACAGCAGTGG AAATAA 
1521 C ATTCAGAAT AGAGG GT ATATTG AAGTTCCAATTCACTTC 

15 61 CCATCCACATCTACCAGATATAGAGTTCGTGTGAGGTATG 
1 601 CTTCTGTGACCCCTATTCACCTCAACGTTAATTGGGGTAA 
1641 TTCATCCATCTTCTCCAATACAGTTCCAGCTACAGCTACC 
1681 TCCTTGGATAATCTCCAATCCAGCGATTTCGGTTACTTTG 
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1721 AAAGTGCCAATGCTTTTACATCTTCACTCGGTAACATCGT 1760 

17 51 GGGTGTTAGAAACTTTAGTGGGACTGCAGGAGTGATTATC 1800: 

1801 GACAGATTCGAGTTCATTCCAGTTACTGCAACACTCGAGG 1840 

1841 CTGAATATAATCTGGAAAGAGCGCAGAAGGCGGTAATGCG 1880 

• • 

1881 CTGTTTACGTCT AC AAACC AGCTTG GACTCAAG ACAAATG 1920 



1921 G 1921. 



E. einem Struktur-Gen, das fur das Insektizide Protein von at*. HD-73 In dessen gesamter Lange codlert, 
mit der Sequenz: 



1 ATGGACAACAACCCAAACATCAACGAATGCATTCCATACA 40. 

41 ACTGCTTGAGTAACCCAGAAGTTGAAGTACTTGGTGGAGA 80 

, • • - ' 

81 ACGCATTGAAACCGGTTACACTCCCATCGACATCTCCTTG 120 

121 TCCTTGACACAGTTTCTGCTCAGCGAGTTCGTGCCAGGTG 160 

161 CTGGGTTCGTTCTCGGACTAGTTGACATCATCTGGGGTAT 200 

201 CTTTGGTCCATCTCAATGGGATGCATTCCTGGTGCAAATT 240 

• • • • 

241 GAGCAGTTGATCAACCAGAGGATCGAAGAGTTCGCCAGGA 280 

• • • • 

281 ACCAGGCCATCTCTAGGTTGGAAGGATTGAGCAATCTCTA 320 

321 CCAAATCTATGCAGAGAGCTTCAGAGAGTGGGAAGCCGAT 360 

■ • * • 

361 CCTACTAACCCAGCTCTCCGCGAGGAAATGCGTATTCAAT 400 

■ . • • ■ ■ 

401 TCAACGACATGAACAGCGCCTTGACCACAGCTATCCCATT 440 
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441 

481 TACGTTCAAGCAGCTAATCTTCACCTCAGCGTGCTTCGAG 

52 1 ACGTT AG C GTG TTTGGG C AAAGGTG GGG ATTCG ATGCTGC 

561 AAC C ATC AATAGCCGTT AC AACG AC CTTACTAGGCTGATT 

• • • • • « 
601 GGAAACTACACCGACCACGCTGTTCGTTGGTACSACACTG 

641 GCTTGGAGCGTGTCTGGGGTCCTGATTCTAGAGATTGGAT 

• • • •• • 
681 T AGAT ACAACC AG TTCAGGAGAG AATTG AC C CTCAC AGTT 

721 TTGGACATTGTGTCTCTCTTCCCGAACTATGACTCCAGAA 

761 CCTACCCTATCCGTACAGTGTCCCAACTTACCAGAGAAAT 

801 CT AT ACTAAC C CAGTTCTTG AG AACTTCG ACG GTAG CTTC 
« • • • 

841 CGTGGTTCTGCCCAAGGTATCGAAGGCTCCATCAGGAGCC 

881 CACACTTGATGGACATCTTGAACAGCATAACTATCTACAC 

921 CGATGCTCACAGAGGAGAGTATTACTGGTCTGGACACCAG 

9 61 ATCATGGCCTCTCCAGTTGGATTCAGCGGGCCCGAGTTTA 

1001 CCTTTCCTCTCTATGGAACTATGGGAAACGCCGCTCCACA 

1041 ACAACGT ATCGTTGCTCAACTAGGTCAGGGTGTCT ACAGA 

1081 ACCTTGTCTTCCACCTTGTACAGAAGACCCTTCAATATCG 

1161 GTTCGCCTATGGAACCTCTTCTAACTTGCCATCCGCTGTT 
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1201 TACAGAAAGAGCGGAACCGTTGATTCCTTGGACGAAATCC 
12 4 1 CACCACAGAACAACAATGTGCCACCCAGGCAAGGATTCTC 
1281 CCACAGGTTGAGCCACGTGTCCATGTTCCGTTCCGGATTC 

1321 AGC AAC AGTTCC GTGAG CATCATCAG AGCTCCTATGTTCT 

• .'*■'.'■•'• 1 

". 1361 CTTGGATACACCGTAGTGCTGAGTTCAACAACATCATCSC 

1401 ATCCGATAGTATTACTCAAATCCCTGCAGTGAAGGGAAAC 

• - • • 
1441 TTTCTCTTCAACGGTTCTGrrCATTTCAGGACCAGGATTCA 

1481 CTGGTGGAGACCTCGTTAGACTCAACAGCAGTGGAAATAA 

1521 CATTCAGAATAGAGGGTATATTGAAGTTCCAATTCACTTC 

15 61 CCATCCACATCTACCAGATATAGAGTTCGTGTGAGGTATG 
«' . ■ • . •• • 

1 601 CTTCTGTGACCCCTATTCACCTCAACGTTAATTGGGGTAA 

• . • ' ' . • • 
1641 TTCATCCATCTTCTCCAATACAGTTCCAGCTACAGCTACC 

• • • .« 
1681 T C CTTG G AT AAT CT C CAATCC AGCG ATTT CGGTT ACTTTG 

1721 AAAGTGCCAATGCTTTTACATCTTCACTCGGTAACATCGT 

1 7 61 GGGTGTTAGAAACTTTAGTGGGACTGCAGGAGTGATTATC 

1801 GACAG ATTC G AG TTCAT TCCAGTTACTGC AACACTCGAGG 
• • • 

1841 CTGAAT ATAATC T GGAAAGAGCGCAG AAGG CGGTG AATGC 

1881 GCTGTTTACGTCTACAAACCAGCTCGGCCTCAAGACCAAT 

1 92 1 GTG ACG GATTATC ATATTG ATCAAGTGTCCAACTT GGTGA 



105 



EP0 385 962B1 

1961 CCTACCTCAGCGATGAGTTCTGTCTGGATGAAAAGCGAGA 2000 

2001 ATTGTCCGAGAAAGTCAAACATGCGAAGCGACTCAGTGAT 2040 

1Q 2041 GAACGCAATTTACTCCAAGATTCAAATTTCAAAGACATTA 2080 

2081 ATAGGCAACCAG AACGTGGGTGGGGCGGAAGTACAGGGAT . 212 0 

15 2121 TACCATCCAGGGAGGTGACGACGTGTTCAAGGAGAACTAC 2160 

, 2161 GTCACACTATCAGGTACCTTTGATGAGTGCTATCCAACAT 2200. 

2201 ACCTCTACCAGAAGATCGACGAGTCCAAGTTGAAAGCCTT 2240 

.2241 TACCCGTTATCAATTAAGAGGGTATATCGAAGATAGTCAA 2280. 

2281 GACCTCGAGATCTACCTCATCCGCTACAATGCAAAACAT6 2320 

m 2321 AAACAGTAAATGTGCCAGGTACGGGTTCCTTATGGCCGCT 2360 . ■ 

2361 TTCAGCCCAAAGTCCAATCGGAAAGTGTGGAGAGCCGAAT 2400 

35 2401 CGATGCGCGCCACACCTTGAATGGAATCCTGACTTAGATT 2440 

2441 GTTCGTGTAGGGATGGAGAAAAGTGTGCCCATCATTCGCA' 2480 

40 . • '•■ 

2481 TCATTTCTCCTTAGACATTGATGTAGGATGTACAGACTTA 2520 

• .• ■ • 

2521 AATGAGGACCTAGGTGTATGGGTGATCTTTAAGATTAAGA 2560 

46 

2561 CGCAAGATGGGCACGCAAGACTAGGGAATCTAGAGTTTCT 2600 

so 2601 CGAAGAGAAACCATTAGTAGGAGAAGCGCTAGCTCGTGTG 2640 

i * • • 

2641 AAAAG AG CGG AG AAAAAAT GGAG AGAC AAAC GTG AG AAGT 2680 

55 2681 TGGAATGGGAG AC CAAC ATCGTC TAC AAAGAGGC AAAAGA 2720 
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2721 AT CTGT AGAT GCTTTATTTGTAAACTCTCAATATG ATCAA 

• • • • • • 
2761 TTACAAG CGG AT ACG AATATTGC C ATG ATTCATGCGGCAG 

2801 ATAAACGTGTTCATAGCATTCGAGAAGCTTATCTGCCTGA 

2841 GCTGTCTGTGATTCCGGGTGTCAATGCGGCTATTTTTGAA 

• ■ • • •/ • 

2 8 a 1 G AATTAGAAGGGCGTATTTTCACTGCATTCTCCCTCTACG 

2921 ATGCCAGAAACGTCATCAAGAACGGTGACTTCAACAATGG 

2961 CTTATCCTGCTGGAACGTGAAAGGGCATGTAGATGTAGAA 

3001 GAACAAAACAACCAACGTTCGGTCCTTGTTGTTCCGGAAT 

3041 GGGAAGCAGAAGTGTCACAAGAAGTTCGTGTCTGTCCGGG 

3 081 TCGTGGCTATATCCTTCGTGTCACAGCGTACAAGGAGGGA 
3121 TATGGAGAAGGTTGCGTAACCATTCATGAGATCGAGAACA 
3161 ATACAGACG AACTGAAGTTTAGCAACTGCGTAGAAGAGGA 
3201 AATCTATCCAAATAACACGGTAACGTGTAATGATTATACT 
3241 GTAAAT CAAGAAGAATAC GGAGGTGCGTACACTTCTCGTA 
3281 AT CGAG GATATAACG AAGCTC.CTTCCGT AC CAG CTGATTA 

3321 TGCGTC AGTCTATGAAG AAAAATCGTATACAGATGGACGA 

• « • • 
3361 AGAGAGAATCCTTGTGAATTTAACAGAGGGTATAGGGATT 

3401 ACACGCCACTACCAGTTGGTTATGTGACAAAAGAATTAGA 

3441 ATACTT CCC AG AAACCGATAAGG TATGG ATTGAG ATTGGA 
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3481 G AAACG GAAGG AACATTTATCGT GG ACAG CGTG G AATTAC 3520 
3521 TCCTTATGGAGGAA 3534. 



F. einem Struktur-Gen, das fur eln insektizides Protein von B.tk. HD-73 In dessen gesamter Lange codlert, 
mlt der Sequenz: . 



1 ATGG ACAACAACCCAAACATC AACG AAT GCATTCCATACA . .40- 

41 ACTGCTTGAGTAACCCAGAAGTTGAAGTACTTGGTGGAGA • 80 . 

81 ACGCATTGAAACCGGTTACACTCCCATCGACATCTCCTTG 120 

121 TCCTTGACACAGTTTCTGCTCAGCGAGTTCGTGCCAGGTG 160 

161 CTGGGTTCGTTCTCGGACTAGTTGACATCATCTGGGGTAT 200 

201 CTTT GGTC C ATCTC AATGGG ATGCATTC CT GGTG C AAATT 240 

241 GAGCAGTTGATCAACCAGAGGATCGAAGAGTTCGCCAGGA 280 

281 ACCAGGCCATCTCTAGGTTGGAAGGATTGAGCAATCTCTA 320 

321 CCAAATCTATGCAGAGAGCTTCAGAGAGTGGGAAGCCGAT 360 

3 61 CCTACTAACCCAGCTCTCCGCGAGGAAATGCGTATTCAAT 400 

401 T C AACGACAT G AAC AG CGCCTT G ACC AC AG CT ATCC CATT 440 

441 GTTCGCAGTCCAGAACTACCAAGTTCCTCTCTTGTCCGTG 480 

481 TACGTTCAAGCAGCTAATCTTCACCTCAGCGTGCTTCGAG 520 

521 ACGTTAGCGTGTTTGGGCAAAGGTGGGGATTCGATGCTGC 560 
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561 AACCATCAATAGCCGTTACAACGACCTTACTAGGCTGATT 

.601 GGAAACTACACCGACCACGCTGTTCGTTGGTACAACACTG 

641 GCTTGG AGCGTGTCTG G G GTCCTG ATTCTAG AGATTGGAT 

681 TAGATACAACCAGTTCAGGAGAGAATTGACCCTCACAGTT 

721 TT GG AC ATTGTGTCTCTCTTCCC GAACTATGACTCCAGAA 

761 CCTACCCTATCCGTACAGTGTCCCAACTTACCAGAGAAAT 

. 801 CTATACTAACCCAGTTCTTGAGAACTTCGACGGTAGCTTC 

841 CGTGGTTCTGCCCAAGGTATCGAAGGCTCCATCAGGAGCC 

881 CACACTTGATGGACATCTTGAACAGCATAACTATCTACAC 

921 CGATGCTCACAGAGGAGAGTATTACTGGTCTGGACACCAG 
• • • • 

961 ATCATGGCCTCTCCAGTTGGATTCAGCGGGCCCGAGTTTA 

1001 CCTTTCCTCTCTATGGAACTATGGGAAACGCCGCTCCACA 

1041 ACAACGTATCGTTGCTCAACTAGGTCAGGGTGTCTACAGA 

1081 ACCTTGTCTTCCACCTTGTACAGAAGACCCTTCAATATCG 

1121 GTATC AACAACC AGC AACTTTCCGTTCTT GACGGAACAGA 
• • . • 

1161 GTTCGCCTATGGAACCTCTTCTAACTTGCCATCCGCTGTT 

. . .). ' ■ . 

1201 TACAGAAAGAGCGGAACCGTTGATTCCTTGGACGAAATCC 

1241 CACCACAGAACAACAATGTGCCACCCAGGCAAGGATTCTC 

128 1 CCACAGGTTGAGCCACGTGTCCATGTTCCGTTCCGGATTC 
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• . • * • . 
1321 AGC AAC AGTTC C GTG AGCATCATCAGAGCTCCTATGTTCT 1360> 

1361 CTTGGATACACCGTAGTGCTGAGTTCAACAACATCATCGC . 1400 

1401 ATC C GATAGTAT T ACTC AAATCCCTGCAGTG AAGGG AAAC 1440 

1441 TTTCTCTTCAACGGTTCTGTCATTTCAGGACCAGGATTCA . 1480. 

1481 CTGGTGG AGAC CTCGTTAGACTC AACAGC AGT GGAAATAA . .' 1520 

1521 CATTCAGAATAGAGGGTATATTGAAGTTCCAATTCACTTC . 1560 : 

1561 CC AT CCACATCT ACCAGATATAG AG TTCGTGTG AGGTATG 1600 

1601 CTTCTGTGACCCCTATTCACCTCAACGTTAATTGGGGTAA 1640 

1 64 1 TTCATCCATCTTCTCCAATACAGTTCCAGCTACAGCTACC 1680 

1681 TCCTTGGATAATCTCCAATCCAGCGATTTCGGTTACTTTG 1720 

"1721 AAAGTGCCAATGCrrTTACATCTTCACTCGGTAACATCGT 1 7 60 

• • . • . 

1761 GGGTGTTAGAAACTTTAGTGGGACTGCAGGAGTGATTATC 1800 

1801 GACAG ATTCGAGTTCATTCCAGT TACTGCAACACTCGAGG 1840. 

1841 CTGAATATAATCTGGAAAGAGCGCAGAAGGCGGTGAATGC 1880 

1881 GCTGTTTACGTCTACAAACCAACTAGGGCTAAAAACAAAT 1920 

1921 GTAAC G GATTATCATATTGATC AAGTGTCC AATTT AGTTA 1960 

1961 CGTATTTATCGGATGAATTTTGTCTGGATGAAAAGCGAGA 2000 

2001 ATTGTC C G AGAAAGTCAAAC ATGCG AAG C G ACTCAGTG AT 2040 

• • • • 

2041 GAACGCAATTTACTCCAAGATTCAAATTTCAAAGACATTA 2080 
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2081 ATAGGCAACCAGAACGTGGGTGGGGCGGAAGTACAGGGAT 

2121 . TACCATCCAAGGAGGGGATGACGTA3TTAAAGAAAATTAC 

2161 GTCACACTATCAGGTACCTTTGATGAGTGCTATCCAACAT 

2201. ATTTGTATCAAAAAATCGATGAATCAAAATTAAAAGCCTT 

2241 TACCCGTTATCAATTAAGAGGGTATATCGAAGATAGTCAA 

2281 GACTTAGAAATCTATTTAATTCGCTACAATGCAAAACATG 

2321 AAAC AGTAAATG7GCCAGGTACG GGTTCCTTATGG CCGCT 

23 61 TTCAGC CC AAAGTCCAATCGGAAAGTGTGGAGAGC CG AAT 

2401 CGATGCGCGCCACACCTTGAATGGAATCCTGACTTAGATT 

244 1 GTTCGTGTAGGGATGGAGAAAAGTGTGCCCATCATTCGCA 

2481 TCATTTCTCCTTAGACATTGATGTAGGATGTACAGACTTA 

2521 AATG AGGAC CTAGGT GT ATGGGTGATCTTTAAG ATTAAGA 

2561 CGCAAGATGGGCACGCAAGACTAGGGAATCTAGAGTTTCT 

2 601 CGAAGAGAAACCATTAGTAGGAGAAGCGCTAGCTCGTGTG 

.2641 AAAAGAGCGGAGAAAAAATGGAGAGACAAACGTGAAAAAT 

2681 TGGAATGGGAAACAAATATCGTTTATAAAGAGGCAAAAGA 

2721 ATCTGTAGATGCT1TATTTGTAAACTCTCAATATGATCAA 

27 61 TTACAAGCGGATACGAATATTGCCATGATTCATGCGGCAG 

2801 ATAAACGTGTTCATAGCATTCGAGAAGCTTATCTGCCTGA 
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2841 GCTGTCTGTGATTCCGGGTGTCAATGCGGCTATTTTTGAA 2880 

• • • ' " " 

2881 GAATTAGAAGGGCGTATTTTCACTGCATTCTCCCTATATG 2920 

"2921 ATGCG AG AAATG TC ATT AAAAAT GGTG ATTTT AAT AATGG 2960 

• • ♦ • ■ 

2951 CTTATCCTGCTGGAACGTGAAAGGGCATGTAGATGTAGAA 3000 
• ' • « ' .-■ 

3001 GAACAAAACAACCAACGTTCGGTCCTTGTTGTTCCGGAAT 3040 
• ' • . • •' • 

30 4 1 GGGAAGCAGAAGTGTCACAAGAAGTTCGTGTCTGTCCGGG 3080 

• • • ■ • 

•3081 TCGTGGCTATATCCTTCGTGTCACAGCGTACAAGGAGGGA. 3120 

• • • ■ • • 

3121 TATGGAGAAGGTTGCGTAACCATTCATGAGATCGAGAACA 3160 

3161 ATACAGACGAACTGAAGTTTAGC AACTGCGTAGAAGAGGA 3200 
3201 AATCTATCCAAATAACACGGTAACGTGTAATGATTATACT 3240 
3241 GTAAATCAAGAAGAATACGGAGGTGCGTACACTTCTCGTA 3280 

3281 ATCG AG G ATATAACG AAGCTC CT TC C GTACCAGCTGATTA 3320 

• • • • 

3321 TGCGTCAGTCTATGAAGAAAAATCGTATACAGATGGACGA 3360 

33 61 AGAGAGAATCCTTGTGAATTTAACAGAGGGTATAGGGATT 3400 
3401 ACACGCCACTACCAGTTGGTTATGTGACAAAAGAATTAGA 3440 

3441 ATACTTCCC AG AAACCG AT AAGG T ATGG ATTG AG ATTGGA 3480 

• • • • • 

'3481 GAAACGGAAGGAACATTTATCGTGGACAGCGTGGAATTAC 3520 

3521 TCCTTATGGAGGAA 3534. 
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G. elnem Struktur-Gen, das fQr ein Insektizides Protein von B.t.fc HD-73 In dessen gesamter LSnge codlert, 
mit der Sequenz: 



1 ATGG AC AAC AACC C AAAC ATCAAC G AATG C ATTCC AT ACA 40. 

. 41 ACTGCTTGAGTAACCCAGAAGTTGAAGTACTTGGTGGAGA 80 

81 ACGCATTGAAACCGGTTACACTCCCATCGACATCTCCTTG 120 
• • • • 

121 TCCTTGACACAGTTTCTGCTCAGCGAGTTCGTGCCAGGTG 160 

161 CTGGGTTCGTTCTCGGACTAGTTGACATCATCTGGGGTAT 200 

201 CTTTGGTCCATCTCAATGGGATGCATTCCTGGTGCAAATT 240 

241 G AGC AGTT G ATCAACCAG AGG ATC G AAGAGTTCGCCAGGA 280 

281 ACCAGGCCATCTCTAGGTTGGAAGGATTGAGCAATCTCTA 320- 

321 C CAAAT CT AT GCAGAG AGCTTC AG AGAGTGGGAAGCC GAT 360" 

361 CCT ACT AACC CAG CTCTCC G C G AGG AAATGC GT ATTC AAT 400 

401 TCAACGACATGAACAGCGCCTTGACCACAGCTATCCCATT 440 

. • • . • • • 

441 GTTCGCAGTCCAGAACTACCAAGTTCCTCTCTTGTCCGTG 480 

481 TACGTTCAAGCAGCTAATCTTCACCTCAGCGTGCTTCGAG 520 

• • • • 

521 ACGTTAGCGTGTTTGGGCAAAGGTGGGGATTCGATGCTGC 560 

561 AAC CATC AATAG C CGTT AC AACG AC CTTACT AGGCTG ATT 600 

• • • ■ • 

601 GGAAACTACACCGACCACGCTGTTCGTTGGTACAACACTG 640 

■ • •• • • 

641 GCTTGGAGCGTGTCTGGGGTCCTGATTCTAGAGATTGGAT 680 
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681 TAGATACAACCAGTTCAGGAGAGAATTGACCCTCACAGTT 720 

721 TTGGACATTGTGTCTCTCTTCCCGAACTATGACTCCAGA& V " 760 

7 61 C CTACCCTATCCG TAC AGTGTCCCAACTTACC AG AGAAAT 8 00 
801 CTATACTAACCCAGTTCTTGAGAACTTCGACGGTAGCTTC . 840: 
841 CGTGGTTCTGCCCAAGGTATCGAAGGCTCCATCAGGAGCC .880 - . 

8 81 CACACTTGATGGACATCTTGAACAGCATAACTATCTACAC 920 
921 C GATGCTCACAGAGG AG AGTATTAC TGGTCTGG AC ACCAG 960 
961 ATCATGGCCTC1CCAGTTGGATTCAGCGGGCCCGAGTTTA 1000 

1001 CCTTTCCTCTCTATGGAACTATGGGAAACGCCGCTCCACA . 1040, 

1041 ACAACGTATCGTTGCTCAACTAGGTCAGGGTGTCTACAGR 1080 

1081 ACCTTGT CTTCC ACCTTGTACAG AAG ACCCTTC AAT ATCG 1120 

1121 G T ATCAACAACCAGCAACTTTC C G TT CTTGAC GGAAC AGA 1160 

1161 GTTCGCCTATGGAACCTCTTCTAACTTGCCATCCGCTGTT 1200 

1201 TACAGAAAGAGCGGAACCGTTGATTCCTTGGACGAAATCC 1240 

1241 CACCACAGAACAACAATGTGCCACCCAGGCAAGGATTCTC 1280 

1281 CCACAGGTTGAGCCACGTGTCCATGTTCCGTTCCGGATTC. 1320 

1321 AGCAACAGTTCCGTGAGCATCATCAGAGCTCCTATGTTCT 1360 » 

13 61 CTTGGATACACCGTAGTGCTGAGTTCAACAACATCATCGC 1 4 00 

1401 ATCCGATAGTATTACTCAAATCCCTGCAGTGAAGGGAAAC 1440 
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1441 TTTCTCTTC AAC GGTTCTGTCATTT C AGG ACC AGG ATTCA 

1481 CTGGTGGAGACC TCG TTAG ACTCAACAGCAGTGGAAATAA 

1521 CATTCAGAATAGAGGGTATATTGAAGTTCCAATTCACTTC 

• ... • • 
15 61 C C ATCC AC ATCT AC CAG AT ATAG AGTT CGTGTGAGGT ATG 

1601 CTTCTGTG ACCCCTATTCACCTCAACGTTAATTGGGGTAA 

• ■ '». • ■ 
1641 TTCATCCATCTTCTCCAATACAGTTCCAGCTACAGCTACC 

».'•••" • • • 

1681 TCCTTGG ATAAT CTCCAATCC AGCG ATTTCGGTTACTTTG 

1721 AAAGTGCCAATGCTTTTACATCTTCACTCGGTAACATCGT 

1761 GGGTGTT AG AAACTTT AGTG G GACTGCAGG AGTG ATT ATC 

1801 G AC AG ATT CG AGTTCATTCCAGTTACTGCAACACTCGAGG 

1841 CTGAGTAC AAC CTTGAG AG AGCCCAGAAGGCTGTG AACGC * 

1881 CCTCTTTACCTCCACCAATCAGCTTGGCTTGAAAACTAAC 

1921 GTTACTG ACTATCACATTGACCAAGTGTCCAACTTGGTCA 

1961 CCTACCTTAGCG ATGAGTTCTGCCTCGACGAGAAGCGTGA 

2001 ACTCTCCGAGAAAGTTAAACACGCCAAGCGTCTCAGCGAC 

2041 GAGAGGAATCTCTTGCAAGACTCCAACTTCAAAGACATCA 

2081 ACAGGCAGCCAGAACGTGGTTGGGGTGGAAGCACCGGGAT 

2121 C ACCATCCAAGGAGGCGACGATGTGTTCAAGGAGAACTAC 

2161 GTCACCCTCTCCGGAACTTTCGACGAGTGCTACCCTACCT 
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2201 ACTTGTACCAGAAGATCGATGAGTCCAAACTCAAAGCCTT 2240; 

"2241 C ACC AGGTATCAACTTAGAGGCTACAT CGAAGACAGCCAA 2280 

2281 GACCTTGAAATCTACTCGATCAGGTACAATGCCAAGCACG 2320 

2321 AGACCGTGAATGTCCCAGGTACTGGTTCCCTCTGGCCACT ■ 2360 

2361 TTCTGCCCAATCTCCCATTGGGAAGTGTGGAGAGCCTAAC 2400 

• ■ ■ " • 

2401 AGATGCGCTCCACACCTTGAGTGGAATCCTGACTTGGACT 2440 

2441 GCTCCTGCAGGGATGGCGAGAAGTGTGCCCACCATTCTCA 2480 

2461 TCACTTCTCCTTGGACATCGATGTGGGATGTACTGACCTG 2520 

2521 AATGAGGACCTCGGAGTCTGGGTCATCTTCAAGATCAAGA , 2560 

• • • -' • 

2561 CCCAAGACGGACACGCAAGACTTGGCAACCTT'GAGTTTCT 2600 

« .' • .• ' • 

2601 CGAAGAGAAACCATTGGTCGGTGAAGCTCTCGCTCGTGTG 2640 

2641. AAGAGAG C AG AG AAG AAGTGGAGGG ACAAAC GTG AG AAAC 2680 

• . • • • 

2681 TCGAATGGGAAACTAACATCGTTTACAAGGAGGCCAAAGA 2720 

2721 GTCCGTGGATGCTTTGTTCGTGAACTCCCAATATGATCAG 2760 

2761 TTGCAAGCCGACACCAACATCGCCATGATCCACGCCGCAG 2800 

2801 ACAAACGTGTGCACAGCATTCGTGAGGCTTACTTGCCTGA 2840 

• • • •. 

2641 GTTGTCCGTGATCCCTGGTGTGAACGCTGCCATCTTCGAG' 2880 

2881 GAACTTGAGGGACGT ATCTTTACCG CATTCTCCTTGTACG 2920 

2921 ATGCCAGAAACGTCATCAAGAACGGTGACTTCAACAATGG 2960 



116 



EP 0 385 962 B1 

2961 CCTCAGCTGCTGGAATGTGAAAGGTCATGTGGACGTGGAG 3000 

3001 GAACAGAACAATCAGCGTTCCGTCCTGGTTGTGCCTGAGT 3040 

• .*••■ : * 

3041 GGGAAGC TG AAGT GTC CCAAG AGGTT AGAGT CTGTCCAGG 3080 

3081 T AG AGGC TAC ATT CTCCGTGT GACCGCTTACAAGG AGGGA 3120 

• .•• • • * • • . 
3121 TAC GGTGAGGGTT GCGTGACCATCCACGAGATCGAGAACA 3160 

■ • • • 

3161 ACACCG ACG AGCTTAAGTTCTCC AACTGCGTCG AGGAAGA 3200 

. .>• ' . 

3201 AATCTATCCCAACAACACCGTTACTTGCAACGACTACACT 3240 

3241 GTGAATCAGGAAGAGTACGGAGGTGCCTACACTAGCCGTA 3280 

• •••.'••'■'•'.■* • 

3281 AC AG AGGTTACAACGAAGCTCCTTCCGTTCCTGCTGACTA 3320 

• . • - • • 

3321 TGCCTCCGTGTACGAGGAGAAATCCTACACAGATGGCAGA 3360 

3361 CGTGAGAACCCTTGCGAGTTCAACAGAGGTTACAGGGACT 3400 

3401 ACACACCACTTCCAGTTGGCTATGTTACCAAGGAGCTTGA 3440 

• • • » 

3441 GTACTTTCCTGAGACCGACAAAGTGTGGATCGAGATCGGT 3480 

3481 GAAACCGAGGGAACCTTCATCGTGGACAGCGTGGAGCTTC 3520 

3521 TCTTGATGGAGGAA 3534. 

H. einem Struktur-Gen, das fOr eln Insektizides Protein von BAA codlert, mlt der Sequenz: 

■ . ' • • • 

1 ATGACTGCAGACAACAACACCGAAGCCCTCGACAGTTCTA 40 

41 CCACTAAGGATGTTATCCAGAAGGGTATCTCCGTTGTGGG 80 
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81 agacctcttgggcgtggttggatttcccttcggtggagcc 120. 
121 ct cgtgagcttct atacaaacttt ctcaac accatttggc 160. 
161 caagcgaggacccttggaaagcattcatggagcaagttga. 200. 
201 agctcttatggatcagaagattgcagattatgccaagaac 240 

241 aaggctttggcagaactccagggccttcagaacaatgtgg \ 280 

281 aggactacgtgagtgcattgtccagc1ggcagaagaaccc s 320 

• ' * • . . • 

321 T GTTAGCTC GAG AAATCC1 1 C AC AGCCAAGGTAGG ATCAGA 360 

361 GAGTIGTTCTCTCAAGCCGAATCCCACTTCAGAAATTCCA 400 

* * * . •>* 

401 TGCCTAGCTTTGCTATCTCCGGTTACGAGGTTCTTTTCCT 440 

' - * .- • •■ • 

441 CACTACCTATG CT CAAGCTGCCAAC ACCCACTTGTTT CTC 480 

481 CTTAAGGACGCT C AAATCT ATGGAG AAG AGTGGGGATACG 520 

521 AGAAAGAGGACATTGCTGAGTTCTACAAGCGTCAACTTAA 560 

561 GCTCACCCAAGAGTACACTGACCATTGCGTGAAA7GGTAT 600 

601 AACGTTGGTCTCGATAAGCTCAGAGGCTCTTCCTACGAGT 640 

* • • ■ . . 
641 CTTGGGTG AACTT CAACAG ATACAG G AGAGAG ATGACCTT 680 

* • • . 
681 GACTGTGCTCGATCTTATCGCACTCTTTCCCTTGTACGAT 720 

721 GTGAGACTCTACCCAAAGGAAGTGAAAACTGAGCTTACCA 760 

761 GAGACGTGCTCACTGACCCTATTGTCGGAGTCAACAACCT 600 

801 TAGGGGTTATGGAACTACCTTCAGCAATATCGAAAACTAC 640 
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. • • • 

841 ATTAGGAAACCACATCTCTTCGACTATCTTCACAGAATTC 880 

.881 AATTCCACACAAGGTTTCAACCAGGATACTATGGTAACGA 920 

921 CTCCTTCAACTATTGGTCCGGTAACTATGTTTCCACCAGA 960 

961 CC AAGCATT GG AT CTAATG AC ATCATCACATCTCC CTTCT 1000 

1001 ATGGTAAC AAGT CCAGTGAACCTGTGCAGAACCTTGAGTT 1040 

1041 CAACGGCGAGAAAGTCTATAGAGCCGTCGCAAACACCAAT 1080 

• " • • • • • 

1081 CTCGCTGTGTGGCCATCCGCAGTTTACTCAGGCGTCACAA 1120 

.1121 AGGTGGAGTTTAGTCAGTATAACGATCAGACCGATGAGGC 1160 

1161 CAGCACCCAGACTT ACGAC TCCAAACGTAACGTTGGC GCA 1200 

1201 GTCTCTTGGGATTCTATCGACCAATTGCCTCCAGAAACCA 1240 

1241 CAGACGAACCATTGGAGAAGGGCTACAGCCACCAACTTAA 1260 

1281 CTATGTGATGTGCTTCTTGATGCAAGGTTCCAGAGGGACC 1320 . 

1321 ATTCCAGTGTTGACCTGGACACACAAGTCCGTGGACTTCT 1360 

1361 TCAACATG ATCG ATAGC AAGAAG AT C ACTCAACTTCCCTT 1 4 00 

1401 G GTGAAAGCCT AC AAGCTG C AATCTGGTGCTTCCGTTGTC 1440 

1441 GCAGGTCCCAGATTCACTGGAGGTGACATCATCCAGTGCA 1480 

1481 C AG AG AAC GG C AG C GCAGC TACT ATCTACGTG AC ACCTGA 1520" 

1521 T GTGTC TTACTCTCAGAAG TACAGGGC ACGT ATTCATTAC 1560 

1561 GCATCTACCAGCCAGATCACCTTCACACTCAGCTTGGATG . 1600 
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1601 GAGCACC CTTCAACCAG TATTACTTTGACAAGACCATCAA . 1640 

" ■ ■ .. 

1 641 CAAAGGTGACACTCTCACATACAATAGCTTCAACTTGGCA . 1680 

• • . •' ' . • • 

1681 AGTTTCAGC AC AC CATTTG AACTCTCAGGCAACAATCTTC 1720 
• . • ■ ■ • . ... 

1721 AGATCGGCGTCACCGGTCTCAGCGCCGGAGACAAAGTCTA. 1760 

" • • •. 

1761 CATCGACAAGATTGAGTTCATCCCAGTGAAC 1791. 

I. elnem Struktur-Gen, das fQr eln Insektizldes Protein von B. t entomocidus codtert, mlt der Sequenz: 

1 ATGG AGG AGAACAAC CAAAACCAATGCATTC CATAC AACT 40 

• -• • • ■ • ' 

41 GCTTGAGTAACCCAGAAGAGGTATTGCTTGATGGAGAACG 80; 

• " ■ ■* ■■ 

81 CATTTCAACCGGTAACTCTTCCATCGACATCTCCTTGTCC 120 

• • • 

121 TTGGTCCAGTTTCTGGTCAGCAACTTCGTGCCAGGTGGTG 160 

161 GGTTCCTTGTCGGACTAATTGACTTCGTCTGGGGTATCGT 200 
201 TGGTCCATCTCAATGGGATGCATTCCTGGTGCAAATTGAG .240 

241 CAGTTGATCAACGAGAGGATCGCTGAGTTCGCCAGGAACG 280 

• • • •. 

281 CTGCCATCGCTAACTTGGAAGGATTGGGCAATAACTTCAA 320 

' 321 CATCTATGTGGAGGCCTTCAAAGAGTGGGAAGAGGACCCT 360 
361 AAC AAC C CAGAG ACC CGCACT AGGG TG ATCG ACAG AT TCA 400 ■ 
401 GAATCTTGGACGGCCTCTTGGAGAGAGATATCCCATCCTT 440 
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481 GCTCAAGCAGCTAATCTTCACCTCGCTATCCTTCGAGACA 
521 GTGTCATCTTTGGGGAAAGGTGGGGATTGACOICTATCAA 

561 CGTCAAT G AGAATTACAACAGACTT ATCAGGCAC ATT G AC 

» ■ 

. ■• • • • 

601 G AGT AC GCC GACC ACTGTGCTAACACCTACAACCGTGGCT 

641 TGAACAATCTCCCTAAGTCTACTTATCAAGATTGGATTAC 

681 CTACAACAGGTTGAGGAGAGACTTGACCCTCACAGTTTTG 

721 GACATTGCAGCTTTCTTCCCGAACTATGACAACAGGAGAT 

761 ACCCTATCCAACCAGTGGGTCAACTTACCAGAGAAGTCTA 

.801 TACTGACCCACTTATCAACTTCAACCCTCAGTTGCAAAGT 
» • • • . • 

841 GTCGCCCAACTTCCCACATTCAACGTCATGGAGTCCAGCC . 

881 GTATCAGGAACCCACACTTGTTTGACATCTTGAACAACCT 
"921 TACTATCTTCACCGATTGGTTCAGCGTTGGGCGTAACTTC 

1001 GTGGGAACATTACCTCTCCTATCTATGGACGTGAGGCAAA . 

1041 CCAGGAGCCACCACGTAGTTTCACCTTCAACGGTCCAGTC 

1081 TTCAG AACCTTGTCT AACCCTACCTTGAGATTGCTCCAGC 

1121 AACCTTGGCCAGCTCCACCTTTCAACCTTAGAGGTGTTGA 

11 61 GGGCGTTGAGTTCTCTACTCCTACCAACTCCTTCACTTAC 

12 01 AG AGGTAG AG G AAC C GTTG ATTCCTTGACC G AACTCC CAC 
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12 41 CAGAGGACAATAGCGTGCCACCCAGGGAAGGCTACTCCCA 1280 

1281 CAGGTTGTGCCACGCAACCTTCGTGCAGCGTTCCGGAACT 1320 

1321 CCATTCCTCACTACAGGAGTTGTGTTCTCATGGACTGATC ; 1360 ' 

1361 GTAGTGCTACTCTCACTAATACCATTGATCCCGAGAGGAT 1400 

1401 CAATCAAATCCCATTGGTCAAGGGTTTCCGTGTGTGGGGA 1440 

1441 G G AACTTCTG TCATC ACAGGACCAGGCTTCACAGGAGGTG 1480 

1481 ATATTCTTAGAAGAAACACTTTTGGCGACTTTGTGAGCCT 1520 

1521 CCAAGTTAACATCAACTCTCCAATTACTCAAAGATATCGT 1560 :. 

1561 CTCAGGTTTCGTTACGCATCTTCCCGTGACGCTAGAGTCA 1600' 

1601 TCGTGCTCACCGGAGCAGCTTCTACCGGTGTCGGTGGACA . 1640 . 

1641 AGTCTCCGTGAACATGCCACTCCAGAAGACTATGGAGATC 1680 

1681 GGCGAGAACTTGACATCCAGGACCTTCAGATACACCG ACT 1720 

1721 TCTCTAACCCTTTCAGTTTCCGTGCCAACCCTGACATCAT 1760 

1761 TGGC ATTAGC G AACAACCT CTCTTT GGAGCT GGT AGC ATC 1800 

1B01 TCATCTGGCGAATTGTACATTGACAAGATTGAGATCATTC 1840. 

1841 TTGC CG ACGCTAC CTTCG AGGCT G AGTCTG ACCTTG AGAG 1880 

1881 AGCCCAG AAGGCTGTGAACGCCCTCTTTACCTCCTCT AAT 1 920 

1921 CAGATTGGCTTGAAAACTGACGTTACTGACTATCACATTG 1960 

1961 ACCAAGTGTCCAACTTGGTCGACTGCCTTAGCGATGAGTT 2000 
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2001 CTGCCTCGACGAGAAGCGTGAACTCTCCGAGAAAGTTAAA 
2041 CACGCCAAGCGTCTCAGCGACGAGAGGAATCTCTTGCAAG 

2081 ACC CCAACTTC AG AGG CATCAACAG GCAGCC AG ACCGTGG 

• • • • 
2121 TTGGAGAGGAAGCACCGACATCACCATCCAAGGAGGCGAC 

2161 GATGTGTTCAAGGAGAACTACGTCACCCTCCCAGGAACTG 

2201 TGGACGAGTGCTACCCTACCTACTTGTACCAGAAGAITCGA 

2241 TGAGTCCAAACTCAAAGCCTACACCAGGTATGAACTTAGA 

2281 G G CTACATC GAAG AC AGCCAAGACCTTGAAATCTACCTCA 

• • . . - • * 
2321 TCAGGTACAATGCCAAGCACGAGATCGTGAATGTCCCAGG 

• .* • • 
2361 TACTGGTTCCCTCTGGCCACTTTCTGCCCAAATGCCCATT 

• ■ • '. . • ■ • 
2401 GGGAAGTGTGGAGAGCCTAACAGATGCGCTCGACACCTTG 

2441 AGrrGGAATCCTGACTTGGACTGCTCCTGCAGGGATGGCGA 

2481 G AAGTGTGCCC ACCATTCTCATCACTTCACCTTGGAC ATC 

2521 GATGTGGGATGTACTGACCTGAATGAGGACCTCGGAGTCT 

• • • * 
2 5 61 GGGTCAT CTTCAAGATCAAGACCCAAGACGGACACGCAAG 

2601 . ACTTGGCAACCTTGAGTTTCTCGAAGAGAAACCATTGCTC 

2641 G GT G AAGCTCTCG CTCGTGTGAAG AGAGC AG AG AAG AAGT 

2681 GGAGGGACAAACGTGAGAAACTCCAACTCGAGACTAACAT 

'2721 CGTTTACAAGGAGGCCAAAGAGTCCGTGGATGCTTTGTTC 
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• » • ■ ■ ■ 

2761 GTGAACTCCCAATATGATAGGTTGCAAGTGGACACCAACA 2800 

2801 TCGCCATGATCCACGCTGCAGACAAACGTGTGCACAGGAT 2840 

2841 TCGTGAGGCTTACTTGCCTGAGTTGTCCGTGATCCCTGGT . 2880 

2881 GTGAACGCTGCCATCTTCGAGGAACTTGAGGGACGTATCT . 2920 

2921 TTACCGCATACTCCTTGTACGATGCCAGAAACGTCATCAA 2960 

2961 GAACGGTGACTTCAACAATGGCCTCTTGTGCTGGAATGTG 3000 

3001 AAAGGTCATGTGGACGTGGAGGAACAGAACAATCACCGTT 3040 

3041 CCGTCCTGGTTATCCCTGAGTGGGAAGCTGAAGTGTCCCA 3060 

3081 AGAGGTTAGAGTCTGTCCAGGTAGAGGCTACATTCTCCGT 3120 

3121 GTGACCGCTTACAAGGAGGGATACGGTGAGGGTTGCGTGA 3 1 60 ' 

3161 CCATCCAC G AG ATCG AGG AC AAC AC CG ACGAGCTTAAGTT 3200 

3201, CTCCAACTGCGTCGAGGAAGAAGTCTATCCCAACAACACC 32 40 

3241 GTTACTT GC AAC AAC T ACACTGGGAC CCAGG AAGAGTACG 3280 

3281 AAGGTACCTACACTAGCCGTAACCAAGGTTACGACGAAGC 3320 

• • • . 

3321 TTACGGAAACAATCCTTCCGTTCCTGCTGACTATGCCTCC 3360 

• • « • 

3361 GTGTACGAGGAGAAATCCTACACAGATGGCAGACGTGAGA 3400 

• • • .• 

3401 ACCCTTGCGAGTCCAACAGAGGTTACGGTGACTACACACC 3440 

• » . . ' « 

3441 ACTTCCAGCAGGCTATGTTACCAAGGACCTTGAGTACTTT 3480 

3481 CCTGAGACCGACAAAGTGT GG ATCG AG ATCG GTGAAACCG 3520 
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3521 AGGGAACCTTCATCGTGGACAGCGTGGAGCTTCTCTTGAT 3560 
3561 GGAGGAA 3567. 

J. einem Struktur-Gen, das fur eln Insektlzides P2-Proteln oodlert, mlt der Sequenz: 



1 ATGGACAACAACGTCTTGAACTCTGGTAGAACAACCATCT 

41 GCGACGCATACAACGTCGTGGCTCACGATCCATTCAGCTT 

• • • • • 
81 CGAACACAAGAGCCTCGACACTATTCAGAAGGAGTGGATG 

121 G AATGG AAACGTACTG AC CACTCTCTCTACGTCGCACCTG 

• • » • 
161 TGGTTGGAACAGTGTCCAGCTTCCTTCTCAAGAAGGTCGG 

201 CTCTCTCATCGGAAAACGTATCTTGTCCGAACTCTGGGGT 

• . « • • 

' 241 ATCATCTTTCCATCTGGGTCCACTAATCTCATGCAAGACA 

281 TCTTGAGGGAGAC CG AACAGTTTCTCAACCAGCGTCTCAA 

• • • • 
321 CACTG ATACCTTG G CTAG AGTCAAC GCTGAGTTGAT CGGT 

• • • • 
3 61 CTCCAAGCAAACATTCGTGAGTTCAACCAGCAAGTGGACA 

401 ACTTCTTGAATCCAACTCAGAATCCTGTGCCTCTTTCCAT 

441 C ACTTCTTCCGTG AACACTATGCAGCAACTCTTCCTCAAC 

• • • 

481 AGATTGCCTCAGTTTCAGATTCAAGGCTACCAGTTGCTCC 

• • • « 
521 TTCTTCCACTCTTTGC1CAGGCTGCCAACATGCACTTGTC 

• • • • 
561 CTTCATACGTGACGTGATCCTCAACGCTGACGAATGGGGA 
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601 ATCTCTGCAGCCACTCTTAGGACATACAGAGACTACTTGA 640 

641 GGAACTAC ACTCG TG ATTACTC CAACTATTGC ATCAACAC 680 

681 TTATCAGACTGCCTTTCGTGGACTCAATACTAGGCTTCAC 720 

721 GACATGCTTGAGTTCAGGACCTACATGTTCCTTAACGTGT 7 60 ' 

761 TTGAGTACGTCAGCATTTGGAGTCTCTTCAAGTACCAGAG . 800 

801 CTTG AT G GT G T CCTC TGG AGCCAATCTCTACGCCTC TGGC 840 

* 841 AGTGGACCACAGCAAACTCAGAGCTTCACAGCTCAGAACT 880 

881 GGCCATTCTTGTATAGCTTGTTCCAAGTCAACTC'CAACTA 920 

921 CATTCTCAGTGGTATCTCTGGGACCAGACTCTCCATAACC 960 

9 61 TTTCCCAACATTGGTGGACTTCCAGGCTCCACTACAACCC 1000 

1001 ATAGCCTTAACTCTGCCAGAGTGAACTACAGTGGAGGTGT 104 0 

1041 C AGCTCT GGATTGATTGGT GCAAC TAACTTG AACCACAAC 1080 

1081 TTCAATTGCTCCACCGTCTTGCCACCTCTGAGCACACCGT 1120 

1121 TTGTGAGGTCCTGGCTTGACAGCGGTACTGATCGCGAAGG 1160 

1161 AGTTGCTACCTCTACAAACTGGCAAACCGAGTCCTTCCAA 1200 

1201 ACCACTCTTAGCCTTCGGTGTGGAGCTTTCTCTGCACGTG 1240 

■ • • • 

1241 GGAATTCAAACTACTTTCCAGACTACTTCATTAGGAACA* 1280 

1281 CTCTGGTGTTCCTCTCGTCATCAGGAATGAAGACCTCACC 1320 

1321 CGTC CACTTCATTAC AACC AG ATT AGG AAC ATCGAG TCTC 1360 
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1361 CATCCGGTACTCCAGGAGGTGCAAGAGCTTACCTCGTGTC 1400 

1401 TGTCCATAACAGGAAGAACAACATCTACGCTGCCAACGAG 1440 

1441 AATGGC AC CATG ATTC ACCTTG CACCAGAAGATTACACTG 1480 

1481 G ATTCAC CATCTCTC CAATCCATG CTACCCAAG7GAACAA 1520 

1521 TCAGACACGCACCTTCATCTCCGAAAAGTTCGGAAATCAA 1560 . 

1561 GGTGACTCCTTGAGGTTCGAGCAATCCAACACTACCGCTA 1600 

1601 GGTACACTTTGAGAGGCAATGGAAACAGCTACAACCTT7A 1640 

1641 . CTTGAGAGTTAGCTCCATTGGTAACTCCACCATCCGTGTT 1680 

1681 ACCATCAACGGACGTGTTTACACAGTCTCTAATGTGAACA 1720 

" • • ' • 

1721 CTAC AACGAACAATGATGGCGTTAACGACAACGGAG CCAG 1760 

1761 ATTCAGCGACATCAACATTGGCAACATCGTGGCCTCTGAC 1800 

1801. AACACTAACGTTACTTTGGACATCAATGTGACCCTCAATT 1840 ; 

■ • ■ • • • 

1841 CTGGAACTCCATTTGATCTCATGAACATCATGTTTGTGCC 1880 

1881 AACTAACCTCCCTCCATTGTAC 1902 

Oder 

K. einerStruktur-Gen-Sequenz, die fQr eln Fuslonsprotein codiert, das die N-termlnalen 610 Amlnosauren von 
BAX HD-1 und die G-termlnalen 567 Amlnosauren von B.t.k. HD-73 aufwelst, welches Gen die Seguenz hat: 

• • • • 

1 ATGG AC AAC AACC C AAACATCAACG AATGCATTC CAT ACA 40 .' 
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41 ACTGCTT GAGTAACCCAGAAGTTGAAGTACTTGGTGGAGA 80 

31 ACGCATTGAAACCGGTTACACTCCCATCGACATCTCCTTG 120 

121 TCCTTGACACAGTTTdGCTCAGCGAGTTCGTGCCAGGTG 160 

161 CTGGGTTCGTTCTCGGACTAGTTGACATCATCTGGGGTAT 200 

201 CTTTGGTCCATCTCAATGGGATGCATTCCTGGTGCAAATT 240 

241 GAGCAGTTGATCAACCAGAGGATCGAAGAGTTCGCCAGGA 280 

281 ACCAGGCCAT.CTCTAGGTTGGAAGGATTGAGCAATCTCTA 320 

321 CCAAATCTATGCAGAGAGCTTCAGAGAGTGGGAAGCCGAT 360 

361 CCTACTAACCCAGCTCtrCCGCGAGGAAATGCGTATTCAAT 400 

401 TCAACGACATGAA CAGCGC CTTGACCACAGCTATCCCATT 440 

• • « • 

"441 GTTCGCAGTCCAGAACTACCAAGTTCCTCTCTTGTCCGTG 480 

481 TACGTTCAAGCAGCTAATCTTCACCTCAGCGTGCTTCGA6 520 

521 ACGTTAGCGTGTTTGGGCAAAGGTGGGGATTCGATGCTGC 560 

561 AACCATCAATAGCCGTTACAACGACCTTACTAGGCTGATT 600 

601 GGAAACTACACCGACCACGCTGTTCGTTGGTACAACACTG 640 

641 GCTTGGAGCGTGTCTGGGGTCCTGATTCTAGAGATTGGAT 680 

681 TAGATACAACCAGTTCAGGAGAGAATTGACCCTCACAGTT 720 

* 

» • • • 

761 CCTACCCTATCCGTACAGTGTCCCAACTTACCAGAGAAAT 800 
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801 CTATACTAACCCAGTTCrT GAGAACTTCGACGGTAGCTTC 840 

841 CGTGGTTCTGCCCAAGGTATCGAAGGCTCCATCAGGAGCC 880 

881 CACACTTGATGGACATCTTGAACAGCATAACTATCTACAC 920 

921 CGATGCTCACAGAGGAGAGTATTACTGGTCTGGACACCAG 960 

961 ATCATGGCCTCTCCAGTTGGATTCAGCGGGCCCGAGTTTA 1000 

1001 CCTTTCCTCTCTATGGAACTATGGGAAACGCCGCTCCACA 1040 

1041 ACAACGTATCGTTGCTCAACTAGGTCAGGGTGTCTACAGA 1080 

• • ■..."« • . • " 
1081 ACCTTGTCTTCCACCTTGTACAGAAGACCCTTCAkTATCG 1120 

.»-.'• • • 

1121 GTAXCAACAACCAGCAACTTTCCGTTCTTGACGGAACAGA ; 1160 : 

• • *■■ ' 
1161 GTTCGCCTATGGAACCTCTTCTAACTTGCCATCCGCTGCT 1200 

■' • • . • 

1201 TACAGAAAGAGCGGAACCGTTGATTCCTTGGACGAAATCC 1240 

• • • • 

1241 CACCACAGAACAACAATGTGCCACCCAGGCAAGGATTCTC 1280 

1281 C C ACAGGTTG AGC CACGTGTCC ATGTTCCGTTCCGGATTC 1320 

• • • • 

1321 AGCAACAGTTCCGTGAGCATCATCAGAGCTCCTATGTTCT . 1360 

• • • " 

1361 CATGGATTCATCGTAGTGCTGAGTTCAACAATATCATTCC 1400 

• • • • 

1401 TTCCTCTCAAATCACCCAAATCCCATTGACCAAGTCTACT 1440 

, • • ■ • . 

1441 AACCTTGGATCTGGAACTTCTGTCGTGAAAGGACCAGGCT 1480 

1481 TCACAGGAGGTGATATTCTTAGAAGAACTTCTCCTGGCCA 1520 
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" 1521 GATTAGCACCCTCAGAGTTAACATCACTGCACCACTTTCI . 1560 

1561 CAAAGATATCGTGTCAGGATTCGTTACGCATCTACCACTA 1600 

1601 ACTTGCAATTCCACACCTCCATCGACGGAAGGCCTATCAA 1640 

. '* ■ 

.1641. TCAGGGTAACTTCTCCGCAACCATGTCAAGCGGCAGCAAC 1 680 

• ' . • • • .• 

1681 TTGCAATCCGGCAGCTTCAGAACCGTCGGTTTCACTACTC 1720 

• •• . • • : ■ 

1721 CTTTCAACTTCTCTAACGGATCAAGCGTTTTCACCCTTAG 1760 

1761 CGCTCATGTGTTCAATTCTGGCAATGAAGTGTACATTGAC 1800 

1801 CGTATTGAGTTTGTGCCTGCCGAAGTTACCCTCGAGGCTG 1840 

1841 AGTACAACCTTGAGAGAGCCCAGAAGGCTGTGAACGCCCT 1880 

1881 CTTTACCTCCACCAATCAGCTTGGCTTGAAAACTAACGTT 1520 

1921 ACTGACTATCACATTGACCAAGTGTCCAACTTGGTCACCT 1960 

1961 ACCTTAGCGATGAGTTCTGCCTCGACGAGAAGCGTGAACT 2 000 

2001 CTCCGAGAAAGTTAAACACGCCAAGCGTCTCAGCGACGAG 2040 

2041 AGGAATCTCTTGCAAGACTCCAACTTCAAAGACATCAACA 2080 

2081 GGCAGCCAGAACGTGGTTGGGGTGGAAGCACCGGGATCAC 2120 

• t • • • 
2121 CATCCAAGGAGGCGACGATGTGTTCAAGGAGAACTACGTC 2160 

2161 ACCCTCTCCGGAACTTTCGACGAGTGCTACCCTACCTACT 2200 

• • • • 

2201 TGTACCAGAAGATCGATGAGTCCAAACTCAAAGCCTTCAC 2240 

• • • • 

2*241 CAGGTATCAACTTAGAGGCTACATCGAAGACAGCCAAGAC 2280 
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• • • • . ■ 
2281 CTTGAAATCTACTCGATCAGGTACAATGCCAAGCACGAGA 2320 

2321 CCGTGAATGTCCCAGGIACTGGTTCCCTCTGGCCACTTTC 2360 

2361 TG C C CAAT CTC C CATTGGG AAGTGT GGAGAGCCT AACAGA 2400 

2401 TGCGCTCCACACCTTGAGTGGAATCCT6ACTTGGACTGCT 2440 

2441 CCTGCAGGGATGGCGAGAAGTGTGCCCACCATTCTCATOl 2480 

2521 GAGGACCTCGGAGTCTGGGTCATCTTCAAGATCAAGACCC 2560 

2561 AAGACGGACACGCAAGACTTGGCAACCTTGAGTTTCTCGA 2600 

2601 AGAGAAACCATTGGTCGGTGAAGCTCTCGCTCGTGTGAftG 2640 

2641 AGAGCAGAGAAGAAGTGGAGGGACAAACGTGAGAAACTCG 2680 

2681 AATGGGAAACTAACATCGTTTACAAGGAGGCCAAAGAGTC 2720 

2721 CGTGGATGCTTTGTTCGTGAACTCCCAATATGATCAGTTG 2760 

■ • • « . 

27 61 CAAGCCGACACCAACATCGCCATGATCCACGCCGCAGACA 2800 

2801 AACGTGTGCACAGCATTCGTGAGGCTTACTTGCCTGAGTT 2840 

2841 GTCCGTGATCCCTGGTGTGAACGCTGCCATCTTCGAGGAA 2880 

2921 C CAG AAAC GTCAT CAAGAACGGTGACTTCAACAATGGCCT 2960 

2961 C AGCTGCTGG AAT GTG AAAGGTCATGTGGAC GTGGAGGAA 3 000 

• • • • 

3001 CAGAACAATCAGCGTTCCGTCCTGGTTGTGCCTGAGTGGG 3040 
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■ * '• .■ • • 

3041 AAGCTGAAGTGTCCCAAGAGGTTAGAGTCTGTCCAGGTAG 3080 

• • • ■ • « 

3081 AGGCTACATTCTCCGTGTGACCGCTTACAAGGAGGGATAC 3120 

• i . • • „■ • 

3121 GGTGAGGGTTGCGTGACCATCCACGAGATCGAGAACAACA 3160 

• •»•••••■." • •■ 

3161 CCGACGAGCTTAAGTTCTCCAACTGCGTCGAGGAAGAAAT 3200 

• • • • " 
3201 CTATCCCAACAACACCGTTACTTGCAACGACTACACTGTG 3240 

3241 AATCAGGAAGAGTACGGAGGTGCCTACACTAGCCGTAACA 3280 

3281 GAGGTTACAACGAAGCTCCTTCCGTTCCTGCTGACTATGC 3320 

3321 CTCCGTGTACGAGGAGAAATCCTACACAGATGGCAGACGT 3360 

3361. G AGAACCCTTGCG AGTTCAACAG AG GTTACAGGG ACTACA 3400 

3401 CACCACTTCCAGTTGGCTATGTTACC AAGGAGCTTGAGTA 3440 

3441 CTTTCCTGAGACCGACAAAGTGTGGATCGAGATCGGTGAA 3480 

3481 ACC6AGGGAACCTTCATCGTGGACAGCGTGGAGCTTCTCT 3520 

3521 TGATGGAGGAA 3531. 



Revendlcatlona 

1. Proc6d§ de modification d"une s6quence de gfcne de structure du type sauvage qui code une proteine Insecticide 
de Bacillus thuringiensis afin d'activer I'expresslon de ladite proteine chez des plantes qui comprend : 

a) ndentificatlon de regions a llnterieur de ladite sequence comprenant plus de quatre nucl6ot!des consecutifs 
d'ade nine ou de thymine, ■ 

b) la modification des regions de I'etape a) qui component deux ou plusieurs slgnaux de polyadenylaton a 
I'lnterieur d'une sequence de dix bases afin d'eliminer lesdits signaux tout en conservant une sequence de 
gene qui code ladite proteine, et 

c) la modification des regions de 15 a 30 bases entourant les regions de I'etape a) afin d'eliminer les signaux 
majeurs de polyadenylatlon de plantes, les sequences consecutives contenant plus cfun signal mineur de 
polyadenylation et les sequences consecutives contenant plus d'une sequence ATTTA tout en conservant une 
sequence de gene qui code ladite proteine. 
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2. Procede de modification d'une sequence de gene de structure du type sauvage qui code une proteine insecticide 
de Bacillus thuringlensis afin d'activer I'expression de ladite proteine chez des plantes qui comprend : 

a) I'ellmlnation des signaux de polyadenylation contenus dans ledit gene de type sauvage tout en conservant 
5 une sequence qui code ladite proteine, et 

b) I'eliml nation des sequences ATTTA contenues dans ledit gene de type sauvage tout en conservant une ■■ 
sequence qui code ladite proteine. 

3. Procede selon la revendication 2, comprenant en outre lamination des sequences autocomplementalres et le 
10 . remplacement de teltes sequences par de. I'ADN non autocomplementaire comprenant des codons pref6res des 

plantes tout en conservant une sequence de gene de structure codant ladite proteine. 

4. Procedg selon les revendications 1 a 3, comprenant en outre I'utilisatlon des sequences pr6ferees des plantes au 
cours de I'ellmlnation des signaux de polyadenylation et des sequences ATTTA. 

15 

5. Procede selon les revendications 1 a 3, dans lequel les signaux de polyadenylation des plantes sont cholsis parml 
le groupe constitue de AATAAA, AATAAT, AACCAA, ATATAA, AATCAA, ATACTA, ATAAAA, ATGAAA, AAGCAT, 
ATTAAT, ATACAT, AAAATA, ATTAAA, AATTAA, AATACA et CATAAA. 

20 6. Proc6d6 destine a a me 116 re r i'expression d'un gene necrologue chez des plantes dans lequel ledit gene comprend 
un gene chlmere modifie comprenant un promoteurqul agit dans les cellules vegetales II6es de facon fonctionnelle 
a une sequence de structure codante et a une region 3' non tradulte contenant un signal de polyadenylation qui 
agit chez des plantes pour provoquer I'addition de nucleotides de polyadenylate sur I'extremite 3* de I'ARN, dans 
lequel ladite sequence de structure codante code une proteine insecticide dont une partie au moins est d6rlvee 

25 d'une proteine de Bacillus thuringlensis, dans lequel ledit procede comprend la modification de ladite sequence 
de structure codante de sorte que ladite sequence comporte une sequence d'ADN qui differe de la sequence 
d'ADN apparalssant dans la nature codant ladite proteine de Bacillus thuringlensis et ladite sequence de structure, 
codante ne contient pas plus de 5 nucleotides consecutifs constitu6s de restes soit adenine, solt thymine. 

30 7. Procede d'ameiioratton de I'expression d'un gene het6rologue chez des plantes dans lequel ledit gene comprend 
un gene chlmere modlfie comprenant un promoteur qui agit dans des cellules vegetales liees de facon fonctionnelle 
a une sequence de structure codante et a une region 3' non tradulte contenant un signal de polyadenylation qui 
agit chez des plantes pour provoquer I'addition de nucleotides de polyadenylate sur I'extremite 3' de I'ARN, dans 
lequel ladite sequence de structure codante code une proteine Insecticide dont au moins une partie est deriv6e 

ss d'une proteine de Bacillus thuringiensis, dans lequel ledit procede comprend la modification de ladite sequence 

de structure codante de sorte que ladite sequence comporte une sequence d'ADN qui differe de la sequence 
d'ADN qui apparatt dans la nature codant ladite proteine de Bacillus thuringlensis et presente les caracteristiques 
sulvantes : ■ 

40 ladite sequence de structure codante comporte une region qui est compiementaire de la sequence suivante : 



GG CTTGATT CCTAGCGAA CT CTTCGATTCTCTGG'ITGATGAGCTGTTC 
45 1 5 10 15 20 25 30 35 40 45 



ladite region dans ladite sequence codante ayant eiimlne 2 sequences AACCAA et 1 sequence AATTAA. 

so 8. Proc6d6 selon la revendication 7, dans lequel ladite sequence de structure codante code une proteine Insecticide 
dont au moins une partie est d6riv6e de Bacillus thuringlensis kurstakis HD-1 . 

9. Procede selon la revendication 7 ou 8, dans lequel la plante est un plan de tabac. 

ss 10. G6ne chlmere modifie contenant un promoteur qui agit dans des cellules vegetales Il6es de facon foncttonnelle a 
une sequence de structure codante et a une region 3' non tradulte contenant un signal de polyadenylation qui agit 
chez des plantes pour provoquer {'addition de nucleotides de polyadenylate sur I'extr6mit6 3" de I'ARN, dans lequel 
ladite sequence de structure codante code une proteine insecticide dont au moins une partie est derlv6e d'une 



133 



EP 0 385 962 B1 

protglne de Bacillus thuringlensls, dans lequel ladite sequence de structure codante comporte une sequence 
d'ADN qui differe de la sequence d'ADN apparaissantdans la nature codant ladite protefne de Bacillus thuringlensls 
et est cholsie a partlr de : 

A. Un gene de structure qui code une proteine Insecticide de B.tk. HD-1 comportant la sequence : 
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1 A1GGCTATAGAAACTGGTTACACCCC31ATCGAIATTTCCT 40 

41 TGTCGCTAACGCAAmCTmGAGTGAATTTGTTCCCGG .80 

81 TGCTGGATTTGTGTTAGGACTAGTTGATATTATCTGGGGA 120 

• " • '* • 

121 ATTTTTGGTCCCT CTCAATGGGAC GCATTTCTTGTACAAA 160 

• ■ •-. ' • • • 

161 TTGAACAGCXCATCAACCAGAGAATCGAAGAGTTCGCTAG 200 

201 GAATCAAGCCATTTCTAGATTAGAAGGACTAAGCAATCTT 240 

241 . TATCAAATTTACGCAGAATCTTTTAGAGAGTGQGAAGCAG 280 

281 ATCCTACTAATCCAGCATTAAGAGAAGAGATGCGTATTCA 320 

321 ATTCAATGACArGAACAGTGCCCTTACAACCGCTATTCCT 360 

401 TGTACGTTCAAGCTGCCAACCTC CACCtCTCAGTTTTGAG 440 

441 AGATGTT TCAGT GTTT GGACAAAG GTGGGG ATTTGATGCC 480 

481 GCGACTATCAATAGTCGTTATAATGAITTAACTAGGCTTA 520 

521 TTGGCAACIATACAGATCATGCTGTACGCTGGTACAATAC 560 

• • • • ■ . 
561 GGGATTAGAGCGT GTATGGGG AC CGG ATTCTAG AG ATTGG 600 

601 ATCAGGTACAACCAGTTCAGAAGAGAGCTTACACTAACTG 640 

• • • . • . ■ 
641 T ATTAG ATATC GTTTCTCTATTTC CGAACT ATGATAGTAG 680 . 

681 AACGTATCCAATTCGAACAGTTrCCCAATTAACAAGAGAA 720 
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• • . «. u 

721 ATTTATACAAACCCAGTATTAGAAAATTTTGATGGTAGTT 760 

■ • . • V .•■..'•■.* 

761 TTCGAGGCTCGGCTCAGGGCATAGAAGGAAGTATXAGGAG 800 

• • • »■ ■ ■ ■ , • . 

801 TCCACATTTGATGGATATACTTAATAGIATAACCATCTAT 840 
• 

841 ACGGATGCrCATAGAGGAGAATACTACTGGTCCGGTCACC 880 . 

• ■ • ■ • .-■ 

881 AGATCATGGCTTCTCCTGTAGGGTTTTCGGGGCCAGAATT 920 

• • • •■.!... 

921 CACTTTTCCGCTATATGGAACTATGGGAAATGCAGCTCCA. 960 

961 CAACAACGTATTGTTGCTCAACTAGGTCAGGGCGTGTATa 1000 

1001 GAACATTATCGTCCACCTTATATAGAAGACCTTTTAACAT 1040 

1041 C GGGATC AACAAC CAACAACTATCTGTTCTTG ACGGGACA 1080 

1081 GAATTTGCTTATGGAACCTCCTCAAATTTGCCATCCGCTG 1120 
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• «■.•'. • • 

U21 TATACAGAAAAAGCGGAACGGTAGATTCGCTGGATGAAiir 1160. 

•. ■ I • 

1161 ACCGCC^caGAaTAAC^GiGCCaCCTAGGCaAGGaiTT 1200 

« • " « 

1201 AGTCATCGAITAAGCCATGTTTCAATGTTTCGTTCAGGCT 1240 

" • • • '• • 

1241 TTAGTAATAGIAGTGTAAGTATAATAAGAGCTCCTATGTT 1280 

• •• ' • ■ • • . 
1281 CTCTTGGArACATCGTAGTGCTGAGTTCAACAACATCATC 1320 

.* • ■ • • . • 

1321 CCTTCATCACAAATCACCCAAATCCCACTCACCAAGTOa 1360 

1361 CTAAXCTTGCCTCTGGAACTTCTiSTCGTTAAAGGACC&GG 1400 

1401 ATTTACAGGAGGAGAIIATTCTTCGAAGAACTTCACCTGGC 1440 

1441 CAGATTTCAACCTTAAGAGTAAATATTACTGCACCATTAI 1480 

• • ♦ • . 

1481 CACAAAGATATCGGGTAAGAATTCGCTACGCTTCTACCAC 1520 

1521 AAACCTTCAGTTCCACACATCaATTGACGGAAGACCTATT 1560 . 

1561 AATCAGGGGAATTTTTCAGCAACTATGAGTAGTGGGAGTA 1600 

1601 ATTTACAGTCCGGAAGCTTTAGGACTGTAGGTTTTACTAC 1 64 0 

1641 TCCGTTTAACTTTTCAAATGGATCAAGTGXATTTACGTXA 1 68 0 

. • • • 

1681 AGTGC TCATGTCTTCAAXT CAGGCAATGAAGTTTATAIAG 1720 



. 1721 ATCGAAXTGAATTTGTTCCGGCA 1743. 

B. Un gene de structure qui code une proline Insecticide de BXk. HD-73 comportant la sequence : 
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.1 ATGGCCATTGAAACCGGTTACACTCCCATCGACATCTCCT 40 
4 1 TGTCCCTGACACAGTTTCTGCTCAGCGAGTTCGTGCCaCG ■ 80 

81 TGCTGGGTTCGTTCTCGGACTAGTTGACATCATCTGGGGT 120 

121 ArCTTTGGTCCATCTCAATGGGAXGCATTCCTGGTGCaAA 160 

161 TTGAGCAGTTGATCAACCAGAGGATCGAAGAGTTCGCCAG 200 

• -■ • ••• .• . 

201 GAACCAGGCCAT CTCTAGGTTGGAAGGATTGAGCAAXCTC 240 

• •' '•■ * • . • • •.»."- 
241 TACCAAATCTATGCAGAGAGCTTCAGAGAGTGGGAAGCCG 280 

281 ATCCTAC TAACCCAGCTCTCCGCGAGGAAATGCGTATTCA 320 

321 &TTCAACGACATGAACAGCGCCTTGACCACAGCTATCCCA 360 

361 TTGTTCGCAGTCCACAACTACCAAGrrCCTCTCTTGTCCG . 400 

401 TGTACGTTCAAGCAGCTAATCTTCACCTCAGCGTGCTTCG 440 
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• . • • • 

441 . AGACGTTAGC£n^TTTGGGC3lAAGGTGGGSaTTCGATGCT 480 

.481 GCAACCATCAATAGCCGTTACAACS^CTTACTAGGCTGA 520 

• m . '. . '■" • • • 

521 TTGGAAACTACACCGACCACGCTGTTCGTTGGTACAAQ^ 560 

• •. 

561 TGGCTTGGAGCGTGTCTGGGGTCCTGATTCTAGaGATTGS 600 

601 ATTA(^TACAACCA<jTTCJU3GAQLGAATTGACCCTC&CA6 640 

641 TTTTGGACATTGTGTCTCTCTTCCCGaACTAIGACTCCaG . 680 

> » • . . • _ v 

681 AACCTACCCTATCCGT&CAGTGTCCCAACTTACCAGAGAA 720 

721 ASCTAIACTAACCCAGTrCTTGAGAACTTCGACGGTAGCT 7 60 

761 TCCGTGGT7CTGCCCAAGGTATCGAAGGC7CCATCAGGA6 800 

• • ' . • . . 

801 CCCACACTTGATGGACATCTTGAACAGCATAACTATCT^ 840 

841 ACCGATGCTCACAGAGGAGAGTATTACTGGTCTGGACACC 880 

881 AGATCATGGCCTCTCCAGTTGGATTCAGCGGGCCCGAGTT 920 

• • • • • 

921 TACCTTTCCTCTCTATGGAACCA1GGGAAACGCCGCTCCA 960 . 

• • • « 

961 CAACAAC GTATCGTTGCTCAACTAGGTCAGGGTGTCTAC& 1000 

• • « « 

1001 GAACCTTGTCTTCCACCTTGTACAGAAGACCCTTCAATAT 1040 
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1041 CGGTATCAACAACCAGCAACTTTCCGTTCTTGACGGAACA. 1080 

1081 GAGTTCGCCTATGGAACCTCTTCTMCTrGCCATCCGCTC 1120 

1121 rrrACAGAAAGAGCGGAACCGTTGArrCOTGGACGAAAT 1160 

1161 CCCACCACAGAACAACAATGTGCC AC CCAGGCAAGGATTC 1200 c 

, ■ • «. '.. . . '■■ 

1201 TCCCACAGGTTGAGCCAC GTGTCCATGTTCC GTTCCGGAT . 1240 

1241 TCAGCAACAGTTC CGTGAGCATCAT CAG AGCTCCTATGTT 1280 

1281 CTC7TGGAIACACCGTAGTGCTGAGTTCAACAACATCAIC 1320 

1321 GCATCCGATAGTATTACTCAAATCCCTGCAGrGAAGGGAA 13 60 

13 61 ACTTTCTCTTCAACGGTTCTGTCATTTCAGGACCAGGATT 1400;. 

• • • • 

1401 CACTGGTGGAGACCTCGTrAGACTCAACAGCAGTGGAAAT 1440 

1441 .AACATTCAGAATAGAGGGTATATTGAAGTTCCAATTCACT 1480 
1481 TCC C ATC CACATCTACCAGATATAGAGTTC GTGTGAGGTfc \ . . 1520 

1521 TGCrrCTGTGACCCCTATTCACCTCAACGTTAATTGGGGT 1560 

• • « • 

1561 AATTCATCCATCrrCTCCAATACAGTXCCAGCTACAGCTA 1600 

1601 CCTCCTTGGATAATCTCCAATCCAGCGATTTCGGTTACTT 1640 
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• ' • • " ■ ■ • 

1641 TGAAAGTGCCAATGCTTTTACATCTTCACTCGGTAACAXC 1680 

• ••• 

1681 GTGGGTGTTAGAAACTTTAGTGGGACTGCAGGAGTGATTA 1720 

1721 TCGACAGATTCGaGTTCATTCCAGTTACTGCAACaCTCGA 1760 

1761 GGCTGAG 1767. 

i - , " * 

C. Un gene de structure codant une proteine insecticide de B.t.k. HD-1 comportant la sequence : 

• • • ■ 

1 ATGGACAACAACC CAAACATCAACG AATGCATTCCATACA 40 

41 ACTGCTTGAGTAACCCAGAAGTTGAAGTACTTGGTGGAGA 80 
81 ACGCATTGAAACCGGTTACACTCCCATCGACATCTCCTTG 120 . 
121 TCCTTGACACAGTTTCTGCTCAGCGAGTTCGTGCCAGGTG 160 

161 CTGGGT7CGTTCTCGGACTAGTTGACATCAXCTGGGGTAX 200 

• • • ■ • " • 

■ ■ • : . • • 

241 GAGCAGTTGATCAACCAGAGGATCGAAGAGTTCGCCAGGA 280 

281 ACCAGGCCATCTCTAGGTTGGAAGGATTGAGCAATCTCTA 320 

• ■ *. • • • 

321 CCAAATCTATGCAGAGAGCTTCAGAGAGTGGGAAGCCGAX 360 
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' 361 CCTACTAACCCAGCTCTCCGCGAGGAAATGCGlAITCA&r 400 

■ • • • " V 

401 TCAACGACATGAACAGCGCCTTGACCACaGCTaTCCCATT 440 

441 GTTCGCAGTCCAGAACTACCAAGTTCCTCTCTTGTCCGTG 480. 

• .• • •*■''. 
481 TACGTTCAAGCAGCTAATCTTCACCTCAGCGTGCT 520 

• • a 

521 ACGTTAGCGTGTTTGGGCAAAGGTGGGGATTCGATGCTGC /.. 560 

561 AACCATCAATAGCCGTTACAACG^CTTACTAGGCTGillT 600 

601 GGAAACTACACC GACCACGCTGTTCGTTGGTACiV&CACTG 640 

641- GCT7GGAGCGTGTCTGGGGTCCTGATTC7A5AGATTGGAX . 680 

681 TAGATACAACCAGTTCAGGAGAGAATTGACC CTCACAGTT 720. 

• . . ' • 
721 TTGGACATTGTGTCTCTCTTCCCGAACTATGACTCCAGAA 750 

761 CCTACCCTATCCGTACAGTGTCCCAACTTACCAGAGAAAX 800 

801 CTATACTAACCCAGTTCTTGAGAACTT CGAC GGTAGCTTC 840 

• • ... 

841 CGTGGTTCTGCCCAAGGTATCGAAGGCTCCATCAGGAGCC 880 

• ■■ • • 

881 C ACACTT GATGGACATCTTGAACAGCATAACTATCTACAC S20 

921 CGATGCTCACAGAGGAGAG TATTACTGGTCTGGACACCAG 960 
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• • • • 

' 9 61 ATCATGGCCTCTCCAGTTGGATTCAGCGGGCCCGaGTna 1000 

• • . • . • • • 

100X CCTTTCCTCTCTATGGAACTATGGGAAACGCCGCTCCACA 1040 

■'•••*•'■■ . • • 

1041 ■ ACAACGTATCGTTGCTCAACTAGGTCAGGGTGTCTACAGA 1080 

• • 

1081 ACCTTGTCTTCCACCTTGTACAGAAGACCCTTCAAIATC6 1120 

1121 GTAXCAACAACCAGCAACTTTCCGTTCTTGACGGAACAG& 1160 

1161 GTTCGCCTATGGAACCTCTTCTAACTTGCCATCCGCTGTT 1200 

1201 TACAGAAAGAGCGGAACCGTTGATTCCrTGGACGAAATCC 1240 

1241 CACCACA6AACAACAATGTGCCACCCAGGCAAGGATTCTC 1280 

1281 CCACAGGTTGAGCCACGTGTCCATGTTCCGrrCCGGATTC 1320 

1321 AGCAACAGTTCCGTGAGCATCATCAGAGCTCCTATGTTCT 1380 

1351 CATGGATTCATCGTAGTGCTGAGTTCAACAATATCATTCC 1400, 

1401 rTCCTCTCAAATCACCCAAATCCCATTGACCAAGTCXACT 1440 

1441 AACC2TGGAXCTGGAACTTCTGTCGTGAAAGGACCAGGCT 1480 

• ■ • • • 

1481 TCACAGGAGGTGATATTCTTAGAAGAACTTCTCCTGGCCA 1520 

• • • - 

1521 GATTAGCACCCTCAGAGTTAACATCACTGCACCACrrTCX 1560 
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• • 

1561 CAAAGATATCGTGTCAGGATTCGTTACGCATCTACCACTA 1600 

• ■ • • • 

1601 ACTTGCAATTCCACACCTC CATCGACGGAAGGCCTflTffli A 1640 

1641 TCAGGGTAACTTCTCCGCAACCAIGTGaAGCGGCSGCAAC 1680 

1681 TT6CAATCCGGCAGCTTCAGAACCGTCG GT1 T C ACIRCTC 1720 

• * • •■...■>.. ; 
1721 CTTTCAACTTCTCTAACGGATC^GCGTTTTCACCCTTAG 1760 

• • • •■ 

1761 CGCTCATGTGTTCAATTCTGGCAATGAAGTGTACATTGAC 1800 
1801 CGTATTGAUXT 1 G TGCCTGCCGAAG TTACCTTCGAGGCTG 1840 
1841 AGXAC 1845. 

' . : • . • : ; " 

D. Un gene de structure codant une protelne insecticide derivee de B.tX HD-73 comportant la sequence : 

• • • ' •' 

1 ATGGACIJU^CCCAA&CAIOUVCGAAI 40 

. •■ • • * : 

41 ACTGCTTGAGl^CCCAGAAGTTGAAGXACrrGGTGGAGA 80 

" • ' ■ • ■ ■■■ 

81 ACGCATTGAAACC GGTTACACTCCCATCGACAXCTCCTT6 120 

. • .• 

121 TCCTTGACACAGTTTCTGCTCAGCGAGTTCGTGCCAGGTG 160 . 

161 CTGGGTTCGTTCTCGGACTAGTTGACATCATCTGGGGTAT 200 ■ ■ 
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• ,• • • 

* 201 CTTTGGTCCATCTCAAXGGGATGCATTCCTGGTGCAAAIT 240 

■ m ■ ■ a • 

241 GAGCAGTTGATCAACCAGAGGATCGAAGAGTTCGCCAGGA 280 

• . • • • • . 

281 ACCAGGCCAXCTCTAGGTTGGAAGGAXTGAGCAATCTCEA 320 

321 CaUATCTATGCAGAGAGCTTC&GAGAGTGGGAAGCCGAI 360 

361 . CCTACTAACCCAGCTCTCCGCGAGGAAATGCGTATTCAAT 400 

, 401 TCAACGACATGJ^CAGCGCCTTGACCACAGCIATCCCATT 440 

• • » 

481 TACGTTCAAGCAGCTAATCTTCACCTCAGCGTGCTTCQAG 520 

521 ACGTTAGCGTGTTTGGGCAAAGGTGGGGATT CGATGCTGC 560 

• • ■ ■■ 

5 61 AACCATCAATAGC CGTTACAACGAC CTTACT AGGCTGATT 600 

■ • ■ • - . • • 

601 GGAAACTACACCGACCACGCTGTTCGTTGGTACAACACTG 640 

64 1 • GCTTGGAGCGTGTCTGGGGTCCTGAXTCTAGAGATTGGAT 68 0 

681 TAGATACAACCAGTTCAGGAGAGAATTGACCCTCACAGTT 720 

721 . T TGGACATTGT GTCT CT CTT C C C GAACTAT G ACT CCAGAA 760 

7 61 CCTAC CCTAICCGTAC AGTGTCCCAACTT ACCAGAG AAAT 800 
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• •. . •' •• 

'801 CTATACTAACCCAGTTCTTGAGAACTTCGACGGTAGCrrC.. 840 

841 CGTGGTTCTGCC CAAGGTATCGAAGGCTCCATCAGGAGCC 880 

881 CACACTTGATG(^CATCTTGAACAGCATAACTATCTACAC ■ 920 

921 CGATGCTCACAGAGGAGAGTATTACTGGTCTGGACACCAG 9*0 

961 ATCATGGCCTCTCCAGTTGGATTCAGCGGGCCCGAGTTTA : . 1000 ; 

' • • • " ••. 

1041 ACAACGTATCGTTGCTCAAC1IAGGTCAGGGTGTCTACAGJV 1080 

• * • » m ■ "■ 

1121 GTATCAACAACCAGCAACTTTCCGTTCTTGACGGAACAGA 1160 

1161 . GTTCGCCTATGGAACCTCTTCTAACTXGCCATCCGCTGTT 1200 

1201 TACAGAAAGAGCGGAACCGTTGAXTCCTTGGACGAAAXCC 1240 

• • • • • 

1241 CACCACAGAACAACAATGTGCCACC CAGGCAAGGATTCTC 1280 

1281 CCACAGGTTGAGCCACGTGTCCATGTTCCGTTCCGGATTC 1320 

• , . • • • 

1321 AGCAACAGTTCC GTGAGCA7CATCAGAGCTCC2ATGXTCT 1360 

• ■ « * - . _ 
13 61 CTTGGATACACCGTAGTGCTGAGTTCAACAACATCATCGC 1400 
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'1401 ATCCGATAGTATTACTCAAATCCCTGCAGTGAAGGGAAAC 1440 

• * • • 

1441 TTTCTCTTCAACGGTTCTGTCaTOCRGSaCCRGSaJT^ . 1480 

• • • • 

1481 CTGGTGGAGACCT C GTTAGACTCAACAGCAGTGGAAATAA 1520 

1521 CAXTCAGAATAGAGGGTATATTGA&GTTCCAATTCACTTC 1560 

• • • • • 

1561 CCATCC^CATCTACCAGATA2AGAl^rTCGTGTGAGGTA!re 1600 

1601 CTTCTGTGACCCCTATTCACCTCAACGTTAATTGGGtnaA 1640 

• • • • 

1641 TTCAXCCATCXTCTCCAATACAGTTCCAGCTAOjGCTACC 1680 

1681 TCCTTGGAXAATCTCCAATCCAGCGATTTCGGTTACTTTG 1720 

• • • • 

1721 AAAGTGCCAATGCTTTTACATCTTCACTCGGTAACaTCGT 1760 

• ' * • 

1761 GGGTGTTAGAAACTTTAGTGGGACTGCAGGAGTGATTATC 1800 

1801 GACAGATTCGAGTTCATTCCAGTTACTGCAACACTCGAGG . 1840 

1841 CTGAATATAATCTGGAAAGAGCGCAGAAGGCGGTAATGCG 1880 

1881 CTGTTTACGTCTACAAACCAGCTTGGACTCAAGACAAATG 1320 
1521 G 1921. 

E. Un gene de stmcture codant la proteine insecticide en plelne longueur de B.tk. HD-73 comportant la 
sequence : 
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* », • • • • 

1 ATGGACAACAACCCAAACATCAACGAATGCATTCCATACA 40 

* • •. 

.41 ACTGCTT GAGT AACCCAGAAGTT GAAGTACTTGGTGGAGA 80 

* ■ • ♦ . • * . . • 

81 ACGCATTGAAACCGGTTACACTC CCATCGACATCTCCTTG 120 
• • '• ■ ■ * 

121 TCCTTGACACAGTTTCTGC7CAGCGAGTTCGTGCCAGGTG . 160 

■ • • ■.-..*" 

: 1 61 CTGGGTTCGTTCTCGGACTAGTTGACATCATCTGGGGTAT 200 

201 CTTTGGTCCATCTCAATGGGATGCATTCCTGGTGCAAATT .240 

241 GAGCAGTTGATCAACCAGAGGATCG AAGAGTTCGfc CAGGA 280 

281 ACCAGGCCATC7CTAGGTTGGAAGGATTGAGCAATCTCTA 320 

321 CCAAATCTATGCAGAGAGCTTCAGAGAGTGGGAAGCCGAX 3 60 

'" • • • . 

361 CCTACTAACCCAGCTCTCCGCGAGGAAATGCGTATTCAAT 400 

401 TCAACGACATGAACAGCGCCTTGACCACAGCTATCCCATr 440 

• . • • • 

441 GTTCGCAGTCCAGAACTACCAAGTTCCTCTCTTGTCCGTG 480 

481 TACGTTCAAGCAGCTAATCTTCACCTCAGCGTGCTTCGAS 520 

• • • • 

S21 ACGTTAGCGTGTTTGGGCAAAGGTGGGGATTCGATGCTGC 560 

• • • « 

561 AACCATCAATAGCCGTTACAACGACCTTACTAGGCTGATT 600 
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• • 

601 GGAAACTACACCGACCACGCTGTTCGTTGGTACAACACTG . 640 

• . • 

641 GCTTGGAGCGTGTCTGGGGrcCTGAXICE^^ 680 

681 TAGATACA^CAGTTCAGGAGAGAATTGACCCTCACAGTT 720 

721 TTGGACATTGTGTCTCTCTTCCCGAACTATGACTCCAGaA; 760 

• - • •■ 

• • • • ; 
801 CTATACTAACCCAGTTCTTGAGAACTTCGACGGTAGCTTC 840 

• •• ( . • • 

841 CGTGGTTCTGCCCAAGGTATCGAAGGCTCCATCAGGAGCC 880 

881 CACACTTGATG GACATCTTGAACAGCATA ACTAXCTACAC 920 

• • • . . ' ' • 

921 CGATGCTCACAGAGGAGAGXATTACTGGTCTGGACACCAfi 960 

961 ATCATG GCCTCTCCAGTTGGAXTCAGCGGGCCCGAGTTTA 1000 

1001 CCTTTC CTCTCTATGGAACTATGGGAAACGCC GCTCCACA 1040 

1041 ACAACGTATCGTTGCTCAACTAGGTCAGGGTGXCTACAGA 1080 

1081 ACCTTGTCTTCCACCTTGTACAGAAGACCCTTCAATATCG 1120 

1121 GTATCAACAACCAGCAACTTTCCGTTCTTGACGGAACAGA 1160 

1161 GTTCGCCTATGGAACCTCTTCTAACTTGCCAXCCGCTGTT 1200 
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*1201 TACAGAAAGAGCGGAACCGTTGATTCCTTGGaCGAAATCC 1240 

• • • • 

1241 CACCACAGAACAACAATGTGCCACCCAGGCAAGGATTCTC 1280 

1281 CCACAGGTTGAGCCACGTGTCCATGTTCCGTTCCGG&rrC 1320 

1321 AGCAACAGTrCCGTGAGCAJCATCAGAGCTCCIATGITCT 1360 

1361 CTTGGATACACC GTAGTGCTGAGTTCAACAACATCATCGC 1400 

1401 ATCCGATAGTATTACTCAAATCCCTGCAGTGAAGGGAAAC 1440 

1441 TrrcrCTTCAACGGTTCTGTCATTTCAGGACCAGGATTCA 1480 

1481 CI GGTGGAGACCTCGTTAGACTCAACAGCAGTGGAAATA& 1520 ' 

1S21 CATTCAGAATAGAGGGTATATTGAAGTTCC^TTCACTTC 1560 

• • • ■ 

1561 CCAXCCACATCTACCAGATATAGAGTTCGTGTGAGGTATG 1600 

• .'».■._■' • ■ •' 

1601 CTT CTGTGACCCCTATTCACCTC AACGTTAATTGGGGTAA 1640 

• • • ■• 

1 64 1 TTCATC CATCTTCTCCAATACAGTTCCAGCTACAGCTACC 1 600 
• . ■ •' • • ■ 

1681 TCCTTGGATAAT CTCCAATCC AGC GATTTC GGTT ACTTTG 1720 
■ • • ■ • • ■ 

1721 AAAGTGCCAATGCTTTTACATCTTCACTCGGTAACATCGT 1760 

1761 GGGTGTTAGAAACTTTAGTGGGACTGCAGGAGTGATTATC 1800 
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• 1801 GACAGATTCGAGTTCATTCCAGTTACTGCAACACrCGAGG 1)340 

1841 CTGAATATAATCTGGAAAGAGC GCAGAAGGCGGTGAATGC 1880; 

■ • "• ' • • 

1881 GCTGTTTACGTCTACAAACCAGCTCGGCCTCAAGACCAAff 1920 . 

1921 GTGaCGGaTTATCAIATTGATCJUlGTGTCCAACTTGGTG2L 1960 

1961 CCTACCTCAGCGATGAGTTCTGTCXGGATGAAAAGCGAGA 2000 

• • «' 

, 2001 ATTGTCCGAGAAAGTCAAACATGCGAAGCGACTCaGTGaX 2040 

• ■ •• . .• * 

2041 GAACGCAATTTACTCCAAGATTCAAA!TTTCAAAGACATTA 2080 
» 

2121 TACCATCCAGGGAGGTGACGACGTGTTCAAGGAGAACT&C 2160 

2161 GTCACACT&TCAGGT^CTTTGATGAGTGCTATCCAACXT 2200 

2201 ACCTCTACCAGAAGATCGACGAGTCCAAGTTGAAAGCOT 2240 

• • • 

2241 TACCCGTTATCAATTAAGAGGGTATATCGAAGATAGTCaA 2280 

• • • . 

2281 GACCTCGAGATCTACCTCATCCGCTACAAXGCAAAACATG 2320 

2321 AAACAGXAAATGTGCCAGGTACGGGTTCCTTATGGCCGCT. 2360 

2361 TTCAGCCCAAAGTCCAATCGGAAAGTGTGGAGAGCCGAAT 2400 
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2401 CGATGCGCGCCACACCTIGAATGGAATCCTGACiTAGATT 2440 

2441 GTTCGTGTAGGGATGGAGAAAAGTGTGCCCATCATTCGCA 2480 

• • • ■ • ■ *., 

2481 TCATTTCTCCTTAGACATTGAIGTAGGAXGXACAGACTTA 2220 

2521 . AATGAGGACCTAGGTGTATGGGTGATCTTTAAGATTAAGA 2560 

2561 CGCAAGATGGSCACGCAAGACTAGGGAATCl^iaaGTXTCT 2600 

2601 CGAAGAGAAACCATrAGTAGGAGAAGCGCTAGCTCGTGTG 2640 

2 641 AAAAGAGCGGAGAAA^TGGAGAGACAAACGTGAGAAGT 2680 

• ■■• • • 

2681 TGGAATGGGAGACCAACATCqTCTACAAAGAGGCAAAAGA 2720 

2721 ATCTCTAGATGCTrTATTTGTAAACTCTCAMATGATCaA 2760 

2761 TTACAAGCGG ATAC GAATATTGCCATGATTCATGCGGCAS 2800 

2801 AXAAACGTGTTCATAGCATTCGAGAAGCTXATCTGCCTG& 2840 

2841 GCTGTCTGTGATTCCGGGTGTCAATGCGGCTATTTTTGAA 2880 

• • - • 

2881 GAATTAGAAGGGCGTATTTTCACTGCATTCTCCCTCTACG 2920 

2921. ATGCCAGAAACGT CATCAAGAAC GGT GACTTCAACAATGG 2960 

2961 CTTAXCCTGCTGGAACGTGAAAGGGCATGTAGATGTAGAA 3000 
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• • • ■ +:■>' 

3081. TCGTGGCTATATCCTTCGTGTCACAGCGTAC3UGS3U3GS& 3120 

3121 TATGGAGAAGGTTGCGTAACCATTCATGAGATCGAGAACA 3160 

3161 AIACAGACGAACTGAAGTTTAGCAACTGCGTAGAAGAGGA 3200 

3201 AAXCTAXCQIAATAACACGGTAACGTGTAATGATTAIACT 3240 

3241 GTAAATCAAGAAGAATACGGAGGTGCGTACACTTCTCGTA 3280 

3281 ATCGAGGATATAACGAAGCTCCTTCCGTACCAGCTGATTA . 3320 

3321 TGCGTCAGTCTATGAAGAAAAATCGTATACAGATGGACGA 3360 

3361 AGAGAGAATCCTTGTGAATTTAACAGAGGGTATAGGGATT 3400 

3401 ACACGCCACTACCAGT7GGTTAXGTGACAAAAGAATTAGA 3440 

• • • •*.'..'* 
3441 ATACTT CCCAGAAACC GATAAGG TATGGATTGAGATTGGA . .3480. 

• « * ' • 

3481 GAAACGGAAGGAACAT77ATCGTGGACAGCGTGGAATTAC 3520 

3521 TCCTTATGGAGGAA 3534. 



F. Un gene de structure codant une proline Insecticide en pleine longueur de B.tk. HD-73 comportant la 
sequence: 
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1 ATGGACAACAACC CAAACATCAACGAATGCATTCCAIACA 40 " 

41 ACTGCTTGAGTAACCCAGAAGTTGAAGTACTTGGTGGAGA . 80 

• • ■ • 
io 81 ACGCATTGAAACCGGTTACACTCCCaTCGaCATCTCCTTS 120 

121 T C CT TGACACAGTTTCTGCTCAGCGAGTTCGTGC CAGGTC 160 

is - ' * • •■ • • 

161, CTGGGTTCGTTCTCGGACTAGTTGACATCATCTGGGGTAr 200 

• • • • 
201 CTTTGGTCCATCTCAATGGGATGCATTCCTGGTGCAAATT 240 

20 

241 GAGCAGTIGATCAACCAGAGGATCGAAGAGTTCG'CCAGGA .280 

"'* - • . "m ■' a . 

» 281 ACCAGGCCATCTCTAGGTTGGAAGGArrrGAGCAATCTCXA 320 

«■ ■ • ■ • • • 

321 CCAAATCTATGCAGAGAGCTTCAGAGAGTGGGAAGCCGAT 360 

30 * * .V • 

361 CCTACTAACCCAGCTCTCCGCGAGGAAATGCGTAITCAAI 400 

• ■ • • • 
401 TCAACGACATGAACAGCGCCTTGACCACAGCTATCCCATT 440 

35 

• - • " • 
441 GTTCGCAGTCCAGAACTACCAAGTTCCTCTCTTGTCCGTG 480 

• • • • ' 
.. 40 481 TACGTTCAAGCAGCTAATCTTCACCTCAGCGTGCTTCGAfi 520 

• • » « 

521 ACGTTAGCGTGTTTGGGCAAAGGTGGGGATTCGATGCTGC 560 

45 
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• ■ • ' • • 

361 AACCAXCAAXAGCCGTXACAACGACCTTACTAGGCTGATT . .. 600 

601 GGAAAC7ACACCGACCACGCTGTTCGTTGGTAC2lACACTG . 640 

641 GCTTGGAGCGTGTCTGGGGTCCTGATTCTA(3AGarrGGAI . 680 

681 TAG^ACAACCAGTTOlGGAGaGAATTGACCCTCACAGrr 720 

• . ■ • ■' ' • . • 

761 CCTACCCTATCC CTACAGTGTC C CAACTTAC CAGAGAAAT 800 

• • 

801 CTATACSAACCCAGTTCTXGAGAACTTCGACGGTAGCTTC 840 ■ 

841 CGTGGTTCTGCCCAAGGTAXCGAAGGCTCCATCAGGAGCC . 880 
881 C^CACTTGATGGACATCTTGAACAGCATAACTATCTAO^ 920 
921 CGATGCTCACAGAGGAGAGTATTACTGGTCTGGACACCAG 960 

961 ATCATGGCCTCTCCAGTTGGATTCAGCGGGCCCGAGTTtCA 1OQ0 

• • • • . 

1001 CCTTTCCTCTCTATGGAACTAXGGGAAACGCCGCTCCAC& 1040 

1041 ACAACGTA2CGTTGCTCAACTAGGTCAGGGTG7CTACAGA .1080 

1121 GTATCAACAACCAGCAACTTTCCGTTCTTGACGGAACAGA 1160 

1161 GTTCGCCTATGGAACCTCTTCTAACTTGCCATCCGCTGTT 1200 

• ' * • • * 
1201 TACAGAAAGAGC GGAAC CGTTG ATTCCTT GGACG AAATCC 1240 

1241 CACCACAGAACAACAATGTGCCACCCAGGCAAGGATTCTC 1280 

1281 CCACAGGTTGAGCCACGTGTCCATG7TCCGTTCCGGATTC 1320 
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♦ • • 

1321 AGCAACAGTTCCGTGAGCATCATCAG3LGCTCCTAXGTTCT 1360 

1361 CTTGGATACACCGTAGTGCTGRGTTCAACAACATC^TCGC 1400 

1401 ATCCGATAGTATTACTCAAATCCCTGCAGTGAAGGGAAAC 1440 

• " " • • 

1441 TTTCTCTTCAAC GGTTCTGTC^TTTCAGGACi^GGATTCA 1480 

• • • • • 
1481 CTGGTGGAGACCTCGTTAGACTCA&CAGCAGTGGA 1520 

• • • • 

1521 CATTCAGAATAGAGGG^L^ATTGA&GTrCCAArrCftCTIC 1560 

1561 CCATCCACATCTACCAGATATAGAGTTCGTGTGAGGTAXG 1600 

• .* * • ' 

1601 CTTCTGTGACCCCTArrCACCTCAACGTTAATTGGGGTAA 1640 

• • • • 

1641 TTCAXCCATCTTCTCCAAXACAGTTCCAGCTACAGCTACC 1680 

1681 TCCTTG GATAATCTCCAATCCAGCGATTTCGGTTACTTTC 1720 

• • • 

"1721 AAAGTGCCaATGCTTTTACATCTTCACTCGGTAACATCCT 1760 

" • • • 

1761 GGGTGTTAGAAACTTTAGTGGGACTGCAGGAGTGATTATC 1800 

1801 GACAGATTCGAGTTCATTCCAGTTACTGOIACACTCGAG6 1840 

• • 

1841 CTGAATATAATCTGGAAAGAGCGCAGAAGGCGGTGAAXGC 1880 

• • • • 

1881 GCTGTTTACGTCTACAAACCAACTAGGGCTAAAAACAAAT 1320 

• • • 

1921 GTAACGGATTATCATAT^GATCAAGTGTCCAAXTTAGTTA 1960 

IS 61 CGTAXTTATCGGATGAATTTTGTCTGGATGAAAAGCGAGJl 2000 

* ■ • • • 

2001 ATTGTCCGAGAAAGTCAAACATGCGAAGCGACTCAGTGAIT 2040 

2041 GAACGCAATTTACTCCAAGATTCAAATT^CAAAGACATOA 2080 
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•. 

2081 ATAGGCAACCAGAACG7GGGTGGGGCGGAAGTACAGGGAIT 2120 

.2121 TACCAICCAAGGAGGGGATGACGTATTTAAAGAAAATTAC 2150 

• • « 

"■ 2161 GTCACACTATCAGGTACCTTTGATGAGTGCTAICCaACaT 2200 
' . • ■ • ' •• 

2201 ATTTGTATCAAAAAATCGATGAATCAAAATTAAAAGCCTT 2240 

• ..''«■.■■'"'«■' , 

. 2241 TA^CCGTTAXCAATTAAGAGGCTATATCGAAGAXAGTCAA 2280 . 

2281 GACTTAGAAATCTATTTAATTCGCTACAATGCAAAACAIG 2320 

2321 AAACAGTAAATGTGCCAGGTACGGGTTCCTTATGGCCGCT 2360 

•• • •' 

2361 TICAGCCCAAAGTCCAATCGGAAAGTGTGGAGAGCCGAAr 2400 

• •• ■ • • • 

2401 CGATGC GCGCCACACCTTGAATGGAATCCTGACTTAGATT 2440 

2441 GTTCGTGTAGGGATGGAGAAAAGTGTGCCCAXCATTCGCA 2480 

• ' • • • 

2481 T CATTT CTCCTTAG ACATTGAT GT AGG At GTACAG ACTTA 2520 

2521 AATGAGGACCTAGGTGTATGGGTGATCTTTAAGATTAAGA 2560 

2561 C GCAAGATGGGCACGCAAGACTAGGGAAT CTAGAGTTTCT 2600 . 

2 601 CGAAGAGAAACCATTAGTAGGAGAAGCGCTAGCTCGTGTG 264*0 

2 64 1 AAAAG AGCGGAG AAAAAAT GGAG AGACAAACGTGAAAAAT 2 68 0 

2681 TGGAArGGGAAACAAATATCGTTTATAAAGAGGCAAAAGA 2720 
'• • • • 

2721 ATCTGTAGATGCTTTAITTGTAAACTCTCAArATGATCAA 2760 
» . » • 

2761 TTACAAGCGGATACGAATATTGC CATGATT CATGCGGCAG • 2800 

2801 ATAAACGTGTTCATAGCATTCGAGAAGCTTATCTGCCTGA 2840 
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2841 GCTGTCTGTGATTCCGGCTGTCAATGCGGCTAXTTTTGAA : 2880 

■v ••• .• • 

2881 GAATTAGAAGGGCGTATTTTCACTGCATrCTCCCTAXATG . 2920 

• • •' 

'2321 A7GCGAGAAATGTCATT21&AAATGGTG&TZTIAA7A&766 2960 

• • ■ • ' 

2961 CTT&TCCTGCXGGAACGTG&AAGGGC^GTAGS^TGT&G^ 3000 

• ■ • a. m ' ' 

... 3001 GJUlCAAAACA&CCAACGTTCGGTCCTTGTTGTTCCGGAAr 3040 

3041 GGGAAGCAGAAGTGTCACAAGAAGTTCGTGTCTGTCCGGG 3080 

• ■ • • 

3081 TCGTGGCTATAJCCTTCGtTGTCacaGCGTaCaAGGAGGGi .3120 

3121 TATGGAGAAGGTTGCGTAACCATTCAIGAGATCGAGAACA. 3160 

3161 AIACAGACGAACTGAAGTTTMCAACXGCGTa^GaGa 3200 
» 

3201 AATCTATCCAAATAACACGGTAACGTGTAATGAII^AIACT 32 40 

3241 GTAAATCAAGAAGAA3ACGGAGGTGCGTACACTTCTCGTA 3280 

• • • 

3281 ATCGAGGATATAACGAAGCTCCTTCCGTACCAGCTGATT& 3320 

3321 TGCGTCAGTCTATGAAGAAAAATCGTAtACAGAXGGACa . 3360 

33 61 AGAGAGAATCCTTGTGAATTTAACAGAGGGTAXAGG6ATT 3400 

3401 ACACGC CACTAC CAG7T GGTTATGTGACAAAAGAATTAGA 3440 

3441 ATACrrCCCAGAAACCGATAAGGTATGGATTGAGATTGGA 3480 

"3481 G AAACGGAAGGAACATTTATC GTGGACAGCGTGGAATTAC 3S20 
3521 TCCTTATGGAGGAA 3534. 
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G. Un gene de structure codant une proteine insecticide en plelne longueur de B.tk. HD-73 comportant la 
sequence : . ' ■ ' 



1 ATGGACAACAACCCAAACATCAACGAATGCATTCCATACA 40 

'41 • ACTGCTTGAGTAACCCAGAAGTTGAAGTACTTGGTGGAGA 80 . 

81 ACGCATTGAAACCG^n^ACACTCCCATCGACATCTCCTTG 120 

• ' .* • • '•* 

121 TCCTTGACACAGTTTCTGCTCAGCGAGTTCGTGCCAGGTG 160 

• • ... 
161 CTGGGTTCGTTCTCGGACTAGTTGACATCATCTGGGGTAT 200 

201 CTTTGGtTCCATCTCAATGGGATGCATTCCTGGTGCAAATT 240 

241 GAGCAGTTGATCJUCCAGAGGATCGAAGAfiTTCGCCAGGA • 280 

281 ACCAGGCCATCTCTAGGTTGGAAGGATTGAGCAATCTCT& 320- 

321 CCAAATCTATGCAGAGAGCTTCAGAGAGTGGGAAGCCGAX 360 

361 C CTACTAACCCAGCTCTCC GCG AGGAAATGC GTATTCAAT 400 

401 TCAAC GACATGAACAGCGC CTTGACCACAGCTATCCCATT 440 

441 G TTCGCAGTCCAGAACTACCAAGTTCCTCTCTTGTCCGTG 480 

4 81 TACGTTCAAGCAGCTAATCTTCACCTCAGCGTGCTTCGAG 520 

. • • • ■ 

521 ACGTTAGCGTGTTTGGGC&AAGGTGGGGATTCGAXGCTGC 560 

561 AACCATCAATAGCCGTTACAACGACCTTACTAGGCTGATT 600 

* • • • 

601 GGAAACTACACCGACCACGCTGTTCGrTGGTACAACACTG- 640 

641 GCTTGGAGC GTGT CTGGGGTCCTGATTCTAGAGATTGGAT 680 
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581 TAG&IACAACCAGTTCAGGAGAGAATTGACCCTCACAGTT .720 

• ■■'■■.*.■ • 

721 TTGGACATTGTGTCTCTCTTCCCGAACTATGACTCCAGaA ,760 

• « - • • • «'■.'" 

761 CCTACCCTAXCC^ACAGTGTCCCaAC^^ 800 

• ' ;■• :. * - '• 

. 801 CTATACTAACCCAGTTCTTGaGAACTTCGACGGTAJGCTTC 840 

' • . 

841 CGTGGTTCTGCCC^GCTATCCaSAGGCTCCAIlCJUaGAGCC 880 

• .• • • • ' « 

881 CACACTTGATGGACATCTTGSACAGCA2AACTAXCTACAC 920 

■ . • •.-,». 

921 CGATGCTCACAGAGGAGAGTATTACTGGTCTGGACACCAS 960 

961 ATCATGGCCTCTCCAGTTGGATTCAGCGGGCCCGAtiXrrTA . 1000, 

1001 CCTTTCCTCTCTATGGaACTATGGGAAACSCCGCTCCACa. 1040 ; 

• •■• • • .. 

1041 ACAACGTATCGTTGCTCAACTAGGTCAGGGTGTCTACAG&. ; 1080. 

• • • • »■■;■.■' 
1081 ACCTTGTCTTCCACCTTGTACAGAAGACCCTTCAATATCG 1120 

1121 GTATCAACAACCAGCAACrrrCCGTTCTTGACGGAACAfiA 1160 

1161 GTTCGCCTATGSAACCTCTTCTAACTTGCCATCCGCrGTT 1200 

1201 TACAGAAAGAGCGGAACCGTTGATTCCTTGGACGAAATCC 1240 

1241 CACCACAGAACAACAATGTGCCACCCAGGCAAGGATTCTC 1280 

1281 CCACAGG TTGAGCCACGTGTCCATGTTCCGTTCCGGA3PTC 1320 

• • • - • 

1321 AGCAACAGTTCCGTGAjSCATCATCAGAGCTCCTATGTTCT 1360 » 

1361 CTTGGATACACCGTAGTGCTGAGTTCAACAACA2CATCGC 1400 

1401 ATCCGAT AGTATTACTCAAATCCCTGCACTGAAGGGAAAC 1440 
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1441 TTTCTCTTCAACGGTTCTGTCATTTCAGGACCAGGArrCA 
1481 CTOTGGAGACCTCGTTAGACTCAACAGCAGTGGAAATAA 



1S21 CATTCAGAATAGAGGGTATATTGAAOTCCAArTCACTTC 
•■ • • ■ • 

1561 CCATCCACATCTACCAG&TATAGAGTTCGTGTGAGGTATG 

• • 
1601 CTTCTGTGACCCCTATTCACCTCAACGTTAATTGGGGTAA 

•' ' • ■ ■ . 

1641 TTCATCCAXCTTCTCCAATACAGTTCXAGCTACAGCTACC 

• • ' • • » 
1681 TCCTTGGATAAXCTCCAAICCAGCGArrTCGGTTACTrTG 

• • • • 
1721 AAAGTGCCAATGCTTTXACAXCTXCACTCGXjTAACAJCCT 

1761 GGGTGTTAGAAACTTTAGTGGGACTGCAGGAGTGATTATC 

• • • • 
1801 GACAGATTCGAGTTCATTCCAGTTACTGCAACaCTCCASG 

• • • • • 
1841 CTGAGTACAACCTTGAGAGAGCCCAGAAGGCTGTGAACGC 

• • • • 
1881 CCTCTTT ACCTCCACCAATCAGCTI GGCTTGAAAACTAAC 

1921 GTTACTGACTATCACATTGACCAAGTGTCCAACTTGGTCA 

• • • • 
19 61 CCTACCTXAGCGATGAGTTCTGCCTCGACGAGAAGCGTGA 

2001 ACTCTCCGAGAAAGTTAAACACGCCAAGCGTCTCAGCGAC 

• . • • 
2041 GAGAGGAATCTCTTGCAAGACTCCAACTTCAAAGACATCA 

2081 ACAGGCAGCCAGAACGIGGTTGGGGTGGAAGCACCGGGAT 

2121. CACCArCCAAGGAGGCGACGATGTGTTCAAGGAGAACTAC 

2161 GTCACCCTCTCCGGAACTTTCGACGAGTGCTACCCTACCT . 



. 1480 
1520 
15*0 
1600 
1640 
1680 
1720 
1760 
1800 
1840 
1880 
1920 
1960 
2000 
2040 
2080 
2120 
2160 
220O 
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■ 2201 . ACTTCnACCAGAAGATCGATGAGTCCAAACTC^aAGCCTT 2240 

• • • • 

'2241 CACCAGCTATCAACTlAGAGGCTACATCGiUlSACAGCCAA 2280 

2281. GACCTTSJUAICTACTCGAT^^ 2320 

2321 AGACCGTG31ATGTCCC31GGTACTGGTTCCCTCXGGCCACT 2360. 

2361 TTCT6CCCA^CTCCCATTGGCAAGTG7GG&Gfi.(aCC7iiayC 2400 

2401 AGATCCGCTCCaCACCTTGAGTGGaATCCTGACTTGGACT 2440 

.2441 GCTCCTGCAGGGATGGCGAGAAGTGTGCCC&CCarrCTCA 2460 

2521 AATGAGGACCTC GGAGTCTGGGTCATCTrCAAGA!TCAAG& 2560 

2561 CCCAAGACGGACACGCAAGACTTGGCAACCTTGAGTTTCT 2600 

2 SOI CG^GAGAAACCATTGGTCG^STGAXtaCTCTCGCTCGTGTG 2640 

2641 AAGAGAGCAGAGAAGaAGTGGAGGGACa^GTGAGaAAC 2680 

2661 TCGAATGGGAAACTAACATCGTTTACAAGGAGGCCAAAGA 2720 

2721 GTCCGTGGATGCTTTGTTCGTGAACTCCCAATATGATCAG 2760 

• • • • 

2801 ACAAACGTGTGCACAGCATTCGTGAGGCTTACTTGCCTGA 2840 

* • « a 

2841 GTTGTCCGTGATCCCTGGTGTGAACGCTGCCATCTTCGAQ ' 2880 

2881 G AACTT G AGGG AC GTATCTTTACCGCATTCTCCTTGTACG 2 320 
■ ; • .» • 

2921 ATG^CAGAAACGTCATCAAGAACGGTGACiTCAACaATGG 2960 
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.' • . • •• • • • 

2961 CCTCAGCTGCTGGAATGTGAAAGGTCArGTGGACGTGGAG 3000 

3001 GAACAC^CAAXCAGCGTTCCGTCCTGGTTGTGra 3040 

3041 GGGAAGCTGA&GTGTCCCAAGAGGTTAG&GTCTGTCCAGG 3080 

3081 TAGAGGCTACATTCTCCCTGTGACCGCTTACAAGGAGGG31 3120 

• ' *. .* ' 

3121 TACGGTGA6GG7TGCG7(^CCATCCACGA6ATCGAGAAC&. 3180 

• • • '•.'** 
3161 ACACCGACGAGCTTAAGTTCTCCA&CTGCGTCGAGGA&G21 3200 

3201 AATCTATCCCAACAACACCGTXACXTGCAACGACTACACT 3240 

3241 GTGAAITCAGGAAGACTACGG&GGTGCCTACACTAGCCG1IA 3280 

3281 ACaGAGGTTACaACGA&GCTCCTTCCGrcCCTGCTGACia. 3320 

• • ■ ■ • • • ■ 

3361 CGTGAGAACCCTXGCGAGTIt^ACaGAGGTTACaGGGaCT 3400 

3401 . ACACACCACTTCCAGTTGG CTATGTTACCAAGGAGCTTGA 3440 

3441. GZACTTTCCTGAGACCGACAAAGTGTGGATCGAGATCGGT 3480 

3481 GAAACCGAGGGAACCTTCATCGTGGACAGCGTGGAGCrrC 3520 

3521 TCTTGATGGAGGAA 3534. 

J 

H. Un gene de structure qu! code une proteine Insecticide de att Comportant La sequence : 
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1 ATGACTGCAGACAACAACACC GAAGCCCTCG ACAGTTCTA 40 

41 CCACTAAGGATGTTAXCC&GAAGGGTATCTCCGTTGTGG6 80 

• " • . . • • 

121 CTCGTGAGCTTCTATACAAACTTTCTCAACACCATTTGGC ,■ 160 
*..••, • • '• 

161 CAAGCGAGGACCCTTGGAAAGCATTCATGGAGCAAGTTGA- 200 

201 AGCTCTTATGGATCAGAAGATTGCAGATTATGCCAAGAAC 240 

241 AAGGCTTTGGCAGAACTCCAGGGCCTTCAGAACAATGTGG 280 

• • . ■ . • ■ 

281 AGGACTACGTGAGTGCATTGTCCAGCTGGCAGAAGAACCC 320 

• • • • • • 

321 TGTTAGCTCCAGAAATCCTCACAGCCAAGGTAGGATCAjGA 360 

361 GAGTXGTTCTCTCAAGCCGAATCCCACTTCAGAAAXTCCA 400 

401 TGCCTAGCTTTGCTATCTCCGGTTACGAGGTTCTTTTCCT 440 

441 CACTACCTAIGCTCAAGCTGCCAACACCCACTTisTTTCTC 480 

481 CTTAAGGACGCTCAAATCTATGGAGAAGAGTGGGGATACC 520 

521 . AGAAAGAGGACATTGCTGAGTTCTACAAGCGTCAACTTAA 560 

• • • • 

561 GCTCACCCAAGAGTACACTGACCATTGCGTGAAA2X3GTAT 600 

• ♦ • • 

601 AACGTTGGTCTCGATAAGCTCAGAGGCTCTTCCTACGAGT 640 

641 CTTGGGT GAACTTCAACAGAXACAGGAGAGAGAXGACCTT 680 
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• .. • t ♦ 

721 GTGAGACTCTACC CAA&GGAAGTGA&AAC7GAGCTS&CCA 760 

761 GAGACGTGCTCACTGACCCTATTGTCGGAGTCAACAACCT . 800 

801 TAGGGGTTATG GAACTACCTTCAGCAATAICGAAAACTAC . 840 

841 ATTAGGAAACCACATCTCTTCGACIJ^CTTCACaGAXTTC 880 . 

881 AATTCCACaaAGGTTTC^CCaGGATACiaTGGXAACGA 920 

921 CTCCTTCAACTAXTGGTCCGGTAACTAIG7TTCCACCAGA 960 

961 CCAAGCATTGGATCIAAXGACATCAXCACATCTCCCTXCT 1000 

1001 ATGGTAACAAGTC CAGTGAACCTGTGCAGAACCTTGAGTT 1040' 

• * • • • " 

1041 CAACGGCGAGAAAGTCTATAGAGCCGTCGCAAAOCCAAX 1080 

10 SI CTCGCTGTGTGGCC^CCGCAGTTTACTCAGGCGTCACAA 1120 

1121 AGGTGGAGTTTAGTCAGTAIAACGArCAGACCGATGAGGC 1160 

1161 CAGCACCCAGACTTACGACTCCAAACGTAACGTTGGCGCA 1200 

"'..*'•• • - • 

1201 GTCTCTTGGGATTCTATCGACCAATTGCCTCCAGAAACCA 1240 

• •■ • 

1241 CAGACGAACCATTGGAGAAGGGCTACAGCCACCAAC2TAA 1280 

• • • ■ 

1281 . CTATGTGATGTGCTTCTTGATGCAAGGTTCCAGAGGGACC 1320 

• • • 

1321 ATTCCAGTGTTGACCTGGACACACAAGTCCGTGGACTTCT 1360 

• • • • ■ 

1361 T CAACATGATCGATAGCAAGAAGAT CACTCAACTTCCCTT 1400 
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1441 GCAGGTCCCAGATTCACTGGAGGTGACATCATCCAGTGCA 1480 

1401 CAGAGAACGGCAGCGCAGCTACTATCTACGTGACACCTGA 1520 

1321 XGTGTCTTACTCTCAGAAGTACAGGGCACGTATTCATTAC 1560 

..1561 GCAXCTACCAGCCAGATCACCTTCACACTCAGCTTGGATC 1600 

• " ■ • • • ' ■ 

iffOl GAGCACCCTTCAACCAGXATTACTTTGACAAGACCATCAA^ 1640 

. 1641 CAAAGGTGACACTCTOVCAIACAATAGCOTCAACTTGGCA 1680 

1 681 AGTTTCAGCACACCaTTTGAACrcrCAGGCAaCAATCrTC 1720 

1721 AGATCGGCGTCACCGGTCrCAGCGCCGGAGACaAAGTCTa. 1760 

■ • • 

1761 CATCGACAAGATTGAGTTCATCCCAGTGAAC 1791. 

I 

I. Un gene de structure qui code une proteine Insecticide de B.t entomocldus comportant la sequence : 
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• • • 

1 A2GGAGG&GAACAACCAA^ , 40 

41 GCTTGAGTAACCCAG^GAGGTATTGC1TGATG<3AGAACG fiti 

• " . " • • • • * 

121 tTG5TCCAGTITCTGCTCAGC3aCTTCSTGCCaGGT6<3TO .'. 160 
1 61 GGTTCCTTGTCGGACSAATTGACTTCGUVrGGGGXAXCGT 200 

201 TGGTCCATCTCaATGGGATGCATTCCTGGTGCaaATTGafi : 240 . 

• • • • 

241 CAGTTGATCAACGAGAGGATCGCTGAGTTCGCCAGGAACG 230 

• . . .• ••• .• . 

281 CTGCCATCGCTAACTTGGAAGGATTGGGCAATAACTTCAA 320 

• • " • • • ■ 

321 CATCTATGTGGAGGCCTTCAAAGAGTGGGAAGAGGACCCT 360 

« • • '■' •:■ 

361 AACAACCCAGAGACCCGCACTAGGGTGATCGAOtfaTTCA 400 • 

• ■ -.• • 

401 GAATCTTGGACGGCCTCTTGGAGAGAGAIATCCCATCCTT 440 

• • • . •. • 

441 CAGAATCTCTGGCTTCGAAGTTCCTCTCTTGTCCGTGTAC 480 
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• • • # 

■ 481 GCTCAAGCAC^TAATCTTCACCTCGCTATCCTTCGAGACA 520 

521 GTffrCArmTGCS^AAAGCTGGGGATTGACCaClATCAA 560 

561 CGTCAAIGAGAATTACAACAGACTTATCAGG^ 600 

'• ■ ' ■ 

. • . • ■ • ' •• . 

601 GAGTACGCCGACCACTGTGCTAACACCTACAACCCrrGGCT 640 

€41 TGAACAATCTCCCTAAGTCTACTTATCAAGATTGGATTAC 680 

581 CTACAACAGGTTGAGGAGAGACTTGACCCTCACAGTTTTG 720 

*.'■*■• '• • • 

721 . GACAITGCAGCTTTCTTCCCGAACTATGACAACAGGAGAT . 760 

761 ACCCTATCCAACCAGTGG5TCAACTTACCAi3AGAAGTCTA 800 

801 TACTGACCCACTTATCAACTTCAAC CCTCAGTTGCAAAGT 840 

841 GTCGCCCAACTTCCCACATTCAACGTCATGGAGTCCAGCC 880 

881 GTATCAGGAACCCACACTTGTTTGACATCTTGAACAACCT 920 

"*921 TACTATCTTCACCGATTGGTTCAGCGTTGGGCGTAACTTC 960 

961 TATrGGGGTGGACACAGGGTCATCTCCTCTCTTATTGGAG 1000* 

• • • • 

1001 GTGGGAACATTACCTCTCCTA7CTATGGACGTGAGGCAAA. 1040 

• • • * . • 

1041 CCAGGAGCCACCACGTAGTTrCACCTTCAACGGTCCAGTC 1080 

. • . . 

1081 TTCAGAACCTTGTCTAACCCTACCTTGAGATTGCTCCAfiC 1120 

1121 AACCTTGGCCAGCTCCACCTTTCAACCTTAGAGGTGTTGA 1160 

1161 GGGCGTTGAGTTCTCTACTCCTACCAACTCCTTCACTTAC 1200 

1201 AGAGGTAGAGGAACCGTTGATTCCTTGACCGAACTCCCAC . 1240 
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1241 CAGAGGACAAIAGCGTGCCACCCAGGGAAGGCTACTCCCA 1280 

1281 CAGGTrcnrGCCACGC^CTTCGTGOiGCGTTCCGGAOT 1320 

1321 CCATTCCTCACTACAGGAGTTGTGTTCTCATGGAC7GAIC 1360 

• ' .. •• 

1361 GTAG7GCTACTCTCACTJUITACCATTGATCCCG%GAGG&I 1400 

• ; * " ■• 

1401 CAAXCAAATCCCATTGGTCAAGGGTTTCCGTGTGTGGGGA 1440 

• * « 

1441 GGAACTTCTGTCATCACAGG^CCAGGCTTCACAGGAGGTG 1480 

' ' . *■ • • • ' . 

1481 ATATTCTTAGAAGAAACACTTTTGGC GACTTTGTGAGCCT 152.0 

• ■■ • "■ 

1S21 CCAAGTTAACATCAACTCTCCAATTACTCAAAGATATCCT 1560 

1561 CTCAGGTTTCGTTACGCATCTTCCCGTGACGCTAGAGTCA 1600. 

• . • • • 

1601 XCGTGCrCACCGGAGCAGCTTCTACCGGTGTCGGTGGACA 1640 

1641 AGTCTCCGTGAACA7GCCACTCCAGAAGACXATGG&G&TC • 1680 

1681 GGCGAGAACTTGACATCCAGGACCTTCAGAIACACCGACT 1720 

1721 TCTCTAACCCTTTCAGTTTCCGTGCCAACCCTGACAXCAT 1760 '•" 

• • • ■ . 

1761 TGGCATTAGCGAACAACCTCTCTTTGGAGCTGGTAGCATC 1800 

1801 TCATCTGGCGAATTGTACATrGACAJ^GATTGAGAl- CA TT C 1840 

1841 TTGCCGACGCTACCTPCGAGGCTGAGTCTGACCTTGAGAG 1880 

1881 AGCCCAGAAGGCTGTGAAC GCCCTCTTTACCTC CTCTAAT 1920 

1921 CAGATTGGCTTGAAAACTGACGTTACTGACTATCACAirG 1960 

1961 ACCAAGTGTCCAACTTGGTCGACTGCCTTAGCGAIGAGTT 2000 
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2001 CTGCCTCGACGAGAAGCGTGAACTCTCCGAGAAAGTTAAA 2040 

• • • , 

2041 CACGCCAAGCGTCTCAGCGACGAGAGQAATCTCTTGCAft/S 2080 

2081 AC C C CAACTTCAGAGGCAT CAACAG GCAGCCAGACC GTGG 2120 

2121 TTGGAGAGGAAGCACCGACATCACCATCCAAGGAGGCGAC 2160 

• * • • 

2161 GATGTGTTCAAGGAGAACTACGTCACCCTCCCAjGGAACTa 2200 

• ' ■ • • • 

2201 TGGACGAGTGCTACCCTACCIACTTGTACC^GAAGAICGA 2240 

22 41 TGAGTCCAAACTCAAAGCCTACACCAGGTAXGAACTTAGZL 2280 ■ 
2281 GGCTACATCGAAGACAGCCAAGACCTTGAAATCTACCTCA 2320 

2321 T CAGGTACAATGC CAAGCACG AGAT C GTGAATGTCCCAGG 2360 

• ■ ■ • • 

23 61 TACTGGTTCCCTCTGGCCaCTTTCTGCCCAAATGCCCaiT 2400 

• ■ • • . • 

2401 GGGAAGTGTGGAGAGCCTAACAGATGCGCTCCACACCTTG 2440 
• • «... 

2481 GAAGTGTGCC CACCAXTCTCATCACTTCACCTTGGACAXC 2520 

2521 GATGTGGGATGTACTGACCTGAATGAGGACCTCGGAGTCT 2560 

2561 GGGTCATCTTCAAGATCAAGACCCAAGACGGACACGCAAG 2600 

2601 ACTTGGCAACCrCGAGTTTCTCGAAGAGAAACCATTGCTC 2640 

2641 GGTGAAGCTCTCGCTCGIGTGAAGAGAGCAGAGAAGAAGT 2680 

2681 GGAGGGACAAACGTGAGAAACrCCAACTCGAGACTAACAT 2720 

2721 C GTTTACAAGG AGGCC AAAGAGTCC GTGG ATGCTTTGTTC 2760 



170 



EP 0 385 962 B1 

27 61 GTGAACTCCCAATATGAIAGGTTGCAAG^ 2800 

■*•■■ • * ' • 

.2801 TCGCC^GAICCACGCTGCAEacaAaCGTGTGCRaGGaT 2840 

2881 GTGAACGCTGCCATCrrOauaSAACTT^^ 29*6 

2921 TTACCGCATACTCCTTGTACGaXGCCAGAAACGTCaTC3A 2960 
"... ■' , • ■ . • • • 

2961 GAACGGTGACTTCAACAATGGCCTCTTGTGCTGGAATGTG 3000 

" • ■ ' * 

3001 AAAGOTCAXGTGGACGTGGAGGAACAGAACAATCACCGTT 3040 

• '• • - • 

3041 CCGTCCTGGTTATCCCTGAGTGGGAAGCTGAAGTGTCCCA 3080 

• • • • 

3081 AGAGGTTAGAGTCTGTCCAGGTAGAGGCXACATTCTCCCT 3120 

3 1 61 CCAICCACGAGATCGAGGACAACACCGACGAGCTTRAiOT . 32 00 

3201 CTCCAACTGCGTCGAGGAAGAAGTCTAXCCCAACAACAGC 3240 

3241 GTTACTTGCAACAACTACACTGGGACCOtfaSAAGAGTACG 3280 

3281 AAGGTACCTACACTAGCCCTAACCAAGCTTACGaCGAACC . 3320 

■ • • 

33 61 GTGTACGAGGAGAAATCCTACACAGATGGCAGACGTGAGA 3400 

• • m • • 

3401 ACCCTTGCGAGTCCAACAGAGGTTACGGTGACTACACACC 3440 

• • > ■ 

34 41 ACTTCCAGCAGGCTATGTTACCAAGGACCTTGAGTACT1T 3480 

3481 CCTGAGACCGACAAAGTGTGGATCGAGATCGGTGAAACCG 3520 
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• • • •■ 

3521 AGGGAACCTTCATCGTGGACAGCGTGGAGCTTCTCTTGaT r 3560 

35 SI GGAGGAA 3567. 

..." I ; " ; 

J. Un gene de structure qui code - une proteirte Insecticide P2 comportant la sequence : 
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t ATGGACAACAACGTCTTGAACTCTGGTAGAACAACCATCT 40 
41 GCGACGCATACAACGTCGTGGCTCACGATCCATTCAGCOT 80 
81 CGAACACAA.G^GCCTCGACACTATTCAGAAGGAGTGGATG 120 

,121 G&ATGSAAACSg3lCT6ACCaCTCTCyCTACGTCSCaCCTG 150 

• • •• • • 

161 TGGTT^AACAGTGTCCAGCTTCCTTCTCAAGAAGGTCGS 200 

201 CTCTCTCATCGGAAAACGTATCTTGTCCGAACTCTGGGGT 240 

* . . • • ■ ■ • * • 

' 241 ATCATCT7TCCATCTGGGTCCACTAATCTCATGCAAGAC& . 280 

281 TCT7GAGGGAGACCGAACAGTTTCTCAACCAGCGTCTCAA 320 

321 CACTGATACCTTGGCTAGAGTCAACSCTGAGTTGATCGGT 360 

361 CTCCAAGCAAACATTCGTGAGTTCAACCAGCAAGTGGA£A 400 

401 ACTTCTTGaATCCAACTCAGAATCCTGTGCCTCTTTCCAT 440 

441 CACTTCTTCCGTGAACACTATGCAGCAACTCTTCCTCAAC 480 

481 AGArTGCCrCAGTTTCAGATTCAAGGCTACCAGTTGCTCC 520 

521 TTCTTCCACTCTrTGCTCAGGCTGCCAACATGC^CTTGTC. ; 560' 

• * • • 

5 61 CrrCArACGTGACGTGATCCTCAACGCTGACGAATGGGGA 600 
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601 ATC7CTGCaGCCACTCTTAGGACam^GACTaCTTG& '640 

641 GGAACTAC^XCGTGATTACTCCAACTATTGCAlCAACftC 680 , 

■ '■*.' ■• ■ 

681 TTA3CAG ACTGCCTTTCGTGGACT CAATAC33AGGCTTCRC > 720 

• ■ • • . • 

721 GACATGCTTGAGTTCAGGACCTACATGTTCCTTAACGTGT . 760 " 

761 TTGAOTACGTCAGCATTTGGAGTCTCTTCAAGTACCAGM '.. 800 . 

801. CTTGATGGTGTCCTCTGGAGCCAATCTCTACGCCTCTGGC 840 
'841 AGTGGACCACAGCAAACTCAGAGCrrCACAGCTCAGAACT 880 

881 GGCCATTCTTG7A!TAGCTTGTTCCAA^GTCAACTCCAAC13l ' 920 

921 CATXCTCAGTGG7ATCTCTGGGACCAGACTCTCCATAACC 960 " 

• ■''■«•' • 

961 TTTCCCAACATTGGTGGACTTCCAGGCTCCACTACAACCC 1000 

• • ♦• • . 

1001 AXAGCCTTAACTCTGCCAGAGTGAACTACAGTGGAGGTGT 1040. 

. • . • • • • 

1041 CAGCTCTGGArrGATTGGTGCAAClAACTTGAACCACAAC 1080 

1081 TTCAATTGCTCCACCGTCTTGCCACCTCTGAGCACACCST 1120 

• • • •' • • 

1121 . TTGTGAGGTCCTGGCrTGACAGCGGTACTGATCGCGAACC 1160 

1161 AGTTGCTACCTCTACAAACTGGCAAACCGAGTCCTTCCAA 1200 

• • • ■ ' • 

1201 ACCACTCTTAGCCTTCGGTGTGGAGCTTTCTCTGCACGTG 1240 . 

1241 GGAATTCAAACTACTTTCCAGACTACTTCATTAGGAACA3P 1280 

1281 CTCTGGTGTtCCTCTCGTCATCAGGAATGAAGACCTCACC 1320 

1321 CGTCCACTTCATTACAACCAGATTAGGAACATCGAGTCTC 1360 



174 



EP0 385 962B1 



1361 CATC C GG7ACTCCAGGAGGTGCAAGAGCTTACCTC GTGTC 1400 



1401 TGTCCATAACAGGAAGAACAACATCTACGCTGCCaACGaS 1440 



1441 AATGGCACCaTGATTCACCTTGCACCAGAAGflXEACACTG 1480 



1481 (^TTCACCAITCTCTCCAATCCATGCTACCCAAGTGkACAA 1520 



1521 TCAGACACGCACCTTCATCTCCGAAAAGTTCGGAAATCAA 1560 



1561 GGTGACTCCTTGAGGTTCGAGCAATCCAACACTACCGCTA. 1600 



1601 GGTACACTrTGAGAGGCAATGGAAACAGCTACAACCTTia 1640 



1 64 1 CTTGAGAGTTAGCTCCATTGGTAACTCCACCArCCGTGTT 1680 



1681 ACCaTCaACGGaCGTGTTTACACAGTCTCTAATGTGAACA 1720 



* 1721 CTACaACGAaCAATGATGGCGTTAACGACAACGGAGCCaG 1760 



1761 ArrCAGCGACATCAACATTGGCaACATCGTGGCCTCTGAC 1800 

• • • • 

1801 AACACTAACGTTACTTTGGACATCAATGTGACCCTCAATT 1840 



1881 AACTAACCTCCCTCCATTOTAC 1902 



K. Une sequence de gene de structure codant une proteine de fusion comprenant les acides amines 610 N- 
termlnaux de B.tX HD-1 et les acides amines 567 C-termlnaux de B.Ik. HD-73, ledit gene comportant la 
sequence : 
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1 AIGGACAACAACCCAAACATCAACGAATGCATTCCAIAC& . 40 

41 ACTGCTTGAGTJ^CCCAGAAGirrGAAGTACTTGCTGGAGJl 80 

. 81- ACGCA7TGAAACCGCTTACACTCCCATCGACATCTCCTTG 120 

121 TCCTTGACACAGTTTCTGCTCAGCGAGTTCGTGCCAGGTG . 150 

161 CT6<^TTCGTTCTCG^AC7AG7TG^CATCATCTG<^GZ2T 200 

• ■ « • • 

241 GAGCAGTTGATCAAC CAGAGGATCGAAGAGTTCGCCAGGA 280 

281 ACCAGGCCATCTCTAGGTTGGAAGGATTGAG^AATCTCTA 320 

321 CCAAATCTAIGCAGAGAGCTTCAGAGAGTGGGAAGCCGAS 360 

361 CCTACTAACCCAGCTCTCCGCGAGGAAATGCGTATTCAAT 400 

401 TCAACGACATGAACAGCGCCTTGACCACAGCTATCCCATT 440 
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'441 GTTCGCAGTCCAG3ULCTaCCAAGTTCCTCTCTTGTCCSTS 480 

481 TACGTTCAAGCAGCXAArCTTCACCTCaGCGTGeTTCfia6 : 520 

521 ACGTXAGCGTGTTTGGGCAAAGGTGGGGAXTCG&TGCTGC 560 

«... .,•..*"-•••' 

561 AACCATCAATAGCCGTXACAACGAC CTTACTAGGCTGA2T 600 

■•' v ... 

601 GGAAACTACACCGACCACGCTGTTCGTTGGTACA^ 640 

■" •' 

641 GCrrGGAGCGTGTCTGGGGTCCTGATTCTAtSAG&TTGGai 680 

• . * - • ■'■ • ' "* 

681 TAGAraCAACCAGTTCAGGAGAG&ATTGACCCTCACAGTT 720 

721 TTGGACATTGTGTCTCTCTTCCCGAACTATGACTCCAGAA. 760 

7 61 CCTACCCTATCCGTACAGTGTCCCAACTTACCAGAGAAAT 800 
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801 CXATACTAACCCAGTTCrrGAGAACrTCGaCGGTAGCTTC 540 

■ • • ■ ■ •' 

' •' • • • • 

.881 GACACTTGATGGACATCTTGAACMCATJ^CTATCTACAC 92.0 

•921 CGATGCTCACAGAGGA^GTATTACTGGTCTGGACACCAG 960 

■ • ' •. ' • • • 

961 AICATGGCCTCTCCAGTTGGAXTCAGCGGGCCCGAGTTTA 1000 

' # • . ■ . ■ . m . • 

• ' .*...' • 

1041 ACaACGTATCGOTGCTCaACrrAGGTCA<^TGTCTACaG& 1080 
1081 ACCTTGTCTTCCACCTXGTACAGAAGACCCTTCAATATCG 1120 
1121 GTATCAACAACCAGCAACTTTCC G TTCl'TGACGGAACAGA 1160 

1161 GTTCGCCTATGGAACCTCTTCTAACTTGCCATCCGCTGTT 1200 

1201 TACAGAAAGAGCGGAACCG7TGATTCCTTGGACGAAATCC 1240, 

1281 CCACAGGTXGAGC CACGTGTCCAXGTTCCGTXCCGGATTC 1320 

1321 AGCAACAGTTCCGTGAGCATCAXCAGAGCTCCTATGTTCT 1360 

• « • • ■ 

1361 CAXGGATTCATCGTAGTGCTGAGXTCAACAATATCATTCC 1400 

1401 TTCCTCTCAAAXCACCCAAATCC CATTG ACC AAGTCT ACT 1440 

1441 AACCTTGGATCTGGAACTTCTGTC GTG AAAGGACCAGGCT 1480 

1481 T CACAGGAGGTGATATTCTTAGAAG AACTXCTCCTGGCCA ' 1520 
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..IS21 GAITAGCACCCTCAGAGTTAACAXCACTGCACCAjCTTTCT 1560 
.• • • " '•■ •■ ' 

1561 CAAAGATATCGTGTCaGGaTTCGTEaCGCaTCTACCACTA 1600 

' • ■ ■ . " • • 

1601 ACTTGCAA3TCCACACCTCCATCGACt3GAAGGCCTATCAA 1640 

• ' • • • . • 

1641 . TCaCGCIAAOTCTCCGCA^^ 1680 

• -..".«. ' « , 

1681 TTGCAATCCGGCAGCTTCAGAACCGTCGGTTICACTICTC 1720 

• ■ '. • 

1721 CrrJCAACrTCTCTaACGgi^TCaiU^lYll^CUll'TM 1760 

• • '• • 

1761 CGCTCATGTGTTCAACTCTGGCAATGAAGTGTACATTGAC 1800. 

1801 CGTATTGAGTTTGTGCCTGCCGAAGTraCCCTCGAGGCTQ 1840 

1841 AGTACAACCTTGAGAGAGCCCAGAAGGCTGTGAACGCCCT 1880 

1881 CTTTACCTCCACCAATCAGCTTGG£TTGAAAACTAACGTT 1920 

1921 ACTGACTATCACATTGACCAAGTGTCCAACTTGGTCACCT I960 

1961 ACOTAGCGATGAGTTCTGCCTCGACGAGAAGCGTGAACX 2000 

' *•' • •• 

2001 CTCCGAGAAAGTTAAACACGCCAAGCGTCTCAGCGACGAG 2040 

• . ■ • • • ». 

2041 AGGAATCTCTTGCAAGACTCCAACTTCAAAGACATCAACA 2080 

2081 GGCAGCCAGAACGTGGTTGGGGTGGAAGCACCGGGATCAC 2120 

2121 CATCCAAGGAGGCGACGArGTGrrCAAGGAGAACTACGTC 2160 

2161 ACCCTCTCCGGAACTTTCGACGAGTGCTACCCTACCTACT 2200 

2201 TGTACCAGAAGATCGATGAGTCCAAACTCAAAGCCTTCAC 2240 ■ 

2241 CAGGTAT CAACTTAGAGGCTACATCGAAGACAGCCAAGAC 2280 
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2281 CTTGaAATCTACTCGATCA*^^ 2320 

• • • • • 

2321 CCGTGAATGTCCCAGCTACTGGTTCCCTCTGGCCACTTTC 2360 

23 61 TGCCC^CTCCCATTGGGA3U3TGTGGafiAGCCT^a^ 2400 

• • • • 

2401 TGCGCTCCACACCTTGAGT6GAATCCTGACT7GGACTGCT 2440 

2441 CCTGCAGGGATGGCGAGAAGTGTGCCCACCATTCTCATCJk 2480 

• ■ • • • 

2481 CTECTCCrrGGaCATCGaiGTGGGATGTACTGaCCTGAAT 2520 

j 

2521 GAGGACCTCGGAGTCTGC<rrCArCTTCAAGATCA&GaCCC 2560 

• • ■ - • • 

2S61 2600 

2601 AGAGAAACCATTGGTCGGTGAAGCTCTCGCTCGTGTGAAjG 2640 

• • • • 

2641 AGAGCAGAGAAGAAGTGGAGGQ^CAAACGtTGAGAAA 2680 

• • • • 

2681 A^GGGAA£CTA&CA7CGTTXACAAG<3AGGCC&A&S&iGTC 2720 

• • • • 

. 2721 CGTGG^GCTTTGTTCGTGAACTCCCAAIAIG^CaGTTG 2760 

2761 O^GCCGACACCAAO^CGCCATGATCCACGCCGCAGACA 2800 

2801 AACGTGTGCAC^CATTCGTGAGGCTTACTTGCCTGAGTT 2840 

2841 GTCCCTGATCCCTGGTGTGAACGCrTGCCATCTTCGAGGAA 2880 

2881 CTIGAGGGaCGTATCTTIACCGCATTCTCCTTGTACGAXG 2920-. 

2321 CCAGAAACGTCATCAAGAACGGTGACTTCAACAATGGCCT 2960 

2961 CAGCTGCTGGJU^TGTGAAAan!CATGTGGACGTGGAGGAA 3000 



180 



EP 0 385 962 B1 

■ • • • * • • \ 

3041 AAGCTGAiUn > GTCCCAAGAGGTTAGAGTCTGTCXaGGTafi 3080 

" ■ 8 ••' ' "' " ' ' ' 

• •. . • . 

3081 AGGCTACJOTCTCCGTtSn^C^ 3120" 

• - • • • 

« • 3121 GCTGAGGGTTGCGTGACCATCOCGASn'CGAGAACaACA 3160 

3161 CCGACGAGCTTAAGTTCTC CAAC7GC67C6AGGAAG2L2lAT 3200 

• * i ' '. : 
3201 CTATCCCAACAACACCGTTACTTGCAACGACIACACTGTG 3240 

. 3241 AATCAGGAAGAGTACGGAGGTGCCTACACTAGCCGTAACA 3280 

• '"• ■ . • • . 

3281 GAGGTTACAACGAAGCTCCTTCCGTTCCTGCTGACTATGC 3320 
25 3321 CTCCGTGTACGAG^GAAAICCTACACAGaTGGCAGACGT 3360 

3361 6AGAACCCTTGCGA6TTCAACAGAGGTTAC3U3GGACTACA 3400 

■ 30 • * a . ■ • 

3401 CACCACTTCCAGTTGGCTATGTTACO^G<^GCTTGAGTA . 3440 

•'- • 
^ 3441 CTTTC^TGAGACCGACAAAGTGTGGATCGAGBTCGGTtS^ 3480 

3481 ACCGAGGGAACCTTC&TC6TGG&CAGCGTGG2UGCTXCTC3 3520 

3521 TGftXGG&GGaA 3531. 
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.1 ATGGCTATAGAAACTGGTTACACCCCAATCGATATTTCCT 40 

.41 TGTCGCTAACGCAATTTCTTTTGAGTGAATTTGTTCCCGG 80 ' 

.81 TGCTGGATTTGTGTTAGGACTAGTTGATATAATATGGGGA . 120 

T C 

121 ATTTTTGGTCCCTCTCAATGGGACGCATTTCTTGTACAAA 160 

161 TTGAACAGTTAATTAACCAAAGAATAGAAGAATTCGCTAG 200 . 
C C C ' G C G 

201 GAACCAAGCCATTTCTAGATTAGAAGGACTAAGCAATCTT 240 
T 

241 TATCAAATTTACGCAGAATCTTTTAGAGAGTGGGAAGCAG 280 

281 ATCCTACTAATCCAGCATTAAGAGAAGAGATGCGTATTCA 320 . 

321 ATTCAATGACATGAACAGTGCCCTTACAACCGCTATTCCT 360 ; 

361 CTTTTTGCAGTTCAAAATTATCAAGTTCCTCTTTTATCAG 400 

cc c c 

401 T AT ATG T TC AAGC TGC AAATTTAC ATTTATC AGTTT TGAG 440 
G C C CC C CC C 

4 41 AG ATGT TTC AGT GT TTGG AC AAAGGT GGGGATTTG ATGCC 480 

481 GCGACTAtCAATAGTCGTTATAATGATTTAACTAGGCTTA 520 ' 

521 TTGGCAACTATACAGATCATGCTGTACGCTGGTACAATAC 5 60 

561 GGGATTAGAGCGTGTAT GGGGACCGG ATTCTAGAGATTGG 600 

601 ATAAGATATAATCAATTTAGAAGAGAATTAACACTAACTG 640 
C G C C G C GC T 

641 T ATTAG ATATC G TTTCTCT ATT TC CG AACT ATGATAGT AG . 680 

681 AAC GTATCC AAT TC G AAC AGTTTC CC AATTAAC AAG AG AA 720 
FIGURE 2A 
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721 ATTTATACAAACCCAGTATTAGAAAATTTTGATGGTAGTT 760 

761 TTC G AG G CTC G G C TC AGG GC AT AG AAGGAAGT ATT AGG AG 800 

801 TCCACATTTGATGGATATACTTAATAGTATAACCATCTAT 840 

841 AC GG AT GC TC AT AG AGGAGAATATT ATTGGTC AGGGCATC 880 

C C CTC 

881 AAAT AAT GGCTT CTC CTGT AGGGTTTTC GGGGC C AG AATT 920 
G C 

921 CACTTTTCCGCTATATGGAACTATGGGAAATGCAGCTCCA . 960 

961 CAACAAC<3TATTGTTGCTCAACTAGGTCAGGGCGTGTATA 1000 

1001 GAACATTATCGTCCACCTTATATAGAAGACCTTTTAATAT 1040 

1041 AGGGATAAATAATCAACAACTATCTGTTCTTGACGGGACA 1080 

c c c c 

1081 GAATTTGCTTATGGAACCTCCTCAAATTTGCCATCCGCTG 1120 

1121 TATACAGAAAAAGCGGAACGGTAGATTCGCTGGATGAAAT 1160 . 

11 61 ACCGCCACAGAATAACAACGTGCCACCTAGGCAAGGATTT 1200 

1201 AGTCATCGATTAAGCCATGTTTCAATGTTTCGTTCAGGCT 1240 

1241 TT AGTAATAGTAGTGT AAGT AT AATAAG AGCTCCT ATGTT 1280 

1281 CTCTTGGATACATCGTAGTGCTGAATTTAATAATATAATT 1320 

G C C C C C 

1321 CCTTCATCACAAATTACACAAATACCTTTAACAAAATCTA 1360 
C C C AC C C G 

1361 CTAATCTTGGCTCTGGAACTTCTGTCGTTAAAGGACCAGG 1400 
FIGURE 2B 
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1401 ATTTACAGGAGGAGATATTCTTCGAAGAACTTCACCTGGC 1440 

1441 CAGATTTCAACCTTAAGAGTAAATATTACTGCACCATTAT 1480 

1481 C AC AAAG AT ATC GG GTAAG AATTC GC T ACGCTTCT ACC AC 1520 

1521 AAATTT AC AATT C C AT AC ATC AATTG ACGG AAG AC C TATT 1560 

cc t g ' c ' 

1561 AATCAGGGGAATTTTTCAGCAACTATGAGTAGTGGGAGTA 1600 

1601 ATTTACAGTCCGGAAGCTTTAGGACTGTAGGTTTTACTAC 1640. 

1641 TCCGTTTAACTTTTCAAATGGATCAAGTGTATTTACGTTA 1680 

1681 AGTGCTCATGTCTTC AATTC AGGC AATG AAGTTTAT ATAG 1720 
1721 ATCGAATTGAATTTGTTCCGGCA 1743 
FIGURE 2C 
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1 ATGGATAACAATCCGAACATCAATGAATGCATTCCTTATA. 40 
C C A C A C 

4 1 ATTGTTTAAGTAACCCTGAAGTAGAAGTATTAGGTGGAGA 80 
C C G A T C T 

81 AAGAATAGAAACTGGTTACACCCCAATCGATATTTCCTTG 120 
C C T C T C C C 

12 1 TCGCTAACGCAATTTCTTTTGAGTGAATTTGTTCCCGGTG 1 60 
CT G A G GC C C G C G A 

161 CTGGATTT GTGTTAGGACTAGTTGATATAATATGGGGAAT 200 
G C TC C C C C T 

201 TTTTGGTCCCTCTCAATGGGACGCATTTCTTGTACAAATT 240 
C A T C G G 

241 GAACAGTTAATTAACCAAAGAATAGAAGAATTCGCTAGGA 280 
G G C G G C G C 

281 ACCAAGCC ATTTCT AG ATTAGAAGGACTAAGC AATCTTTA 32 0 
G C G G T 6 C 

321 TCAAATTTACGCAGAATCTTTTAGAGAGTGGGAAGCAGAT 360 
C C T GAGC C C 

361 CCTACTAATCCAGCATTAAGAGAAGAGATGCGTATTCAAT 400 
C TC CC C G A 

401 TC AATGAC ATG AAC AGTGCCCTTAC AACCGCTATTCCTCT 4 40 
C C TG C A C AT 

441 TTTTGCAGTTCAAAATTATCAAGTTCCTCTTTTATCAGTA 480 
G C C G C C C G C G 

481 TATGTTCAAGCTGCAAATTTACATTTATCAGTTTTGAGAG 520 
C A T C T CC CAGC GC TC 

521 ATGTTTCAGTGTTTGGACAAAGGTGGGGATTTGATGCCGC 560 
C AGC G C T 

561 GACTATCAATAGTCGTTATAATGATTTAACTAGGCTTATT 600 
AC C C C CC T G 

601 GGCAACTATACAGATcATGCTGTaCGCTGGTACAATACGG 640 
A CCCCC TT .CT 

641 GATTAGAGCGTGTATGGGGACCGGATTCTAGAGATTGGAT 680 
C G C T T 



FIG0RE 3A 
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681 AAGATATAATCAATTTAGAAGAGAATTAACACTAACTGTA. 720 
T C C (3 C G 6 C C A T 

721, TTAGATATCGTTTCTCTATTTCCGAACTATGATAGTAGAA 760 
G C T G C C CTCC 

761 CGTATCCAATTCGAACAGTTTCCCAATTAACAAGAGAAAT 800 
C C T C T G C T C 

801 TTATACAAACCCAGTATTAGAAAATTTTGATGGTAGTTTT 840 
C T TC T G C C C C C 

841 CGAGGCTCGGCTCAGGGCATAGAAGGAAGTATTAGGAGTC 880 
T T T CAT C CTCC C C 

881 CACATTTGATGGATATACTTAATAGTATAACCATCTATAC 920 
. C C CT G C C T C 

921 GGATGCTCATAGAGGAGAATATTATTGGTCAGGGCATCAA 960 
C C G C T A C G 

961 ATAATGGCTTCTCCTGTAGGGTTTTCGGGGCCAGAATTCA 1000 
C C .ATA CAGC C G T 

1001 CTTTTCCGCTATATGGAACTATGGGAAATGCAGCTCCACA 1040 
C T C C C 

.1041 ACAACGTATTGTTGCTCAACTAGGTCAGGGCGTGTATAGA 1080 
C T C C 

1081 ACATTATCGTCCACCTTATATAGAAGACCTTTTAATATAG 1120 
C G T G C C C C 

1121 GG AT AAATAATCAACAACTATCTGTTCTTGACGGG AC AGA 1160 
T C C C G T C A 

1161. ATTTGCTT ATGGAACCTCCTC AAATTTGCC ATCCGCTGTA 1200 
G C C T T C T 

1201 TACAGAAAAAGCGG AACGGTAGATTC GCTGGATGAAAT AC 1240 
G C T CT C C 

1241 CGCCACAGAATAACAACGTGCCACCTAGGCAAGGATTTAG 1280 
A C T C CTC 

1281 TCATCGATTAAGCCATGTTTCAATGTTTCGTTCAGGCTTT 1320 
C CA G G C G C C C A C 

1321 AGTAATAGTAGTGTAAGTATAATAAGAGCTCCTATGTTCT 1360 
C C TCC G C C C 

1361 CTT GG ATACATC GT AGTG CT G AATTT AAT AAT AT AATTCC 1400 
A T G C C C 

FIGUKZ 33 
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1401 T TC AT C AC AAATT AC ACAAAT AC CT TT AAC AAAATCT ACT 1440 
C T C C C A G C G 

1441 AATCTTGGCTCTGGAACTTCTGTCGTTAAAGGACCAGGAT 1480. 
C A G C 

1481 TTACAGGAGGAGATATTCTTCGAAGAACTTCACCTGGCCA 1520 
C T A T 

1521 GATTTCAACCTTAAGAGTAAATATTACTGCACCATTATCA 1560 
AGC C C T C C C T T 

1561 CAAAGATATCGGGTAAGAATTCGCTACGCTTCTACCACAA 1600 
T C G T A A 

1601 ATT T ACAATTCC AT AC ATC AATTG AC G GAAG ACCT ATTAA 1640 

c g * c c c c g c ; 

1641 TCAGGGGAATTTTTCAGCAACTATGAGTAGTGGGAGTAAT , 1680 
T C C C C TCA C C C C 

1681 TTACAGTCCGGAAGCTTTAGGACTGTAGGTTTTACTACTC 1720 
G A C C A C C C 

1721 CGTTTAACTTTTCAAATGGATCAAGTGTATTTACGTTAAG 1760 
T C C T C C T C CC T 

1761 TGCTCATGTCTTC AATTCAGGCAATGAAGTTTATATAGAT 1800 
C G T G C T C 

1801 CG AATTG AATTTGTTCCGGC AG AAGTAACCTTTGAGGCAG 1840 
T G G T C T C T 

1841 AATAT 1845 
G C 



FIGURE 3C 
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1 ATGGATAACAATCCGAACATCAATGAATGCATTCCTTATA 40 
C C A C AC 

41 ATTGTTTAAGTAACCCTGAAGTAGAAGTATTAGGTGGAGA 80 
C C G A T CI 

81 AAGAATAGAAACTGGTTACACCCCAATCGATATTTCCTTG 120 
C C T C T C C C 

121 TCGCTAACGCAATTTCTTTTGAGTGAATTTGTTCCCGGTG 160 
CT G A G GC C C G C G A 

161 CTGGATTTGTGTTAGGACTAGTTGATATAATATGGGGAAT 200 
G C TC C C C C T 

201 TTTTGGTCCCTCTCAATGGGACGCATTTCTTGTACAAATT . 240 
C A T C G G 

241 GAACAGTTAATTAACCAAAGAATAGAAGAATTCGCTAGGA 280 
G G C G G C G C 

281 ACCAAGCC ATTTCTAG ATTAGAAGGACTAAGCAATCTTTA 320 
G C G G T G C 

321 . TC AAATTT ACG C AG AAT CTTTT AG AG AGTGGG AAGC AG AT 360 
C C T GAGC C C 

361 CCTACTAATCCAGCATTAAGAGAAGAGATGCGTATTCAAT 400 
C TC CC C G A 

401 TCAATGACATGAACAGTGCCCTTACAACCGCTATTCCTCT .440 
C C T G C A C AT 

441 TTTTGCAGTTCAAAATTATCAAGTTCCTCTTTTATCAGTA 480 
GC CGC C CGCG 

481 TATGTTCAAGCTGCAAATTTACATTTATCAGTTTTGAGAG 520 
C AT C T .CC CAGC GC TC 

521 ATGTTTCAGTGTTTGGACAAAGGTGGGGATTTGATGCCGC 560 
C AGC G C T 

561 GACTATCAATAGTCGTTATAATGATTTAACTAGGCTTATT 600 
A C C C C CC T G 

601 GGCAACTATACAGATTATGCTGTACGCTGGTACAATACGG . 640 
A CC CCC TT CT 

64 1 GATTAGAACGTGTATGGGGACCGGATTCTAGAGATTGGGT 680 
C G G C T T A 



FIGURE 4A 
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681 AAGGTATAATCAATTTAGAAGAGAATTAACACTAACTGTA 720 
T A C C G C 6 G C C A T 

721 TTAGATATCGTTGCTCTGTTCCCGAATTATGATAGTAGAA 760 
G C T GT C C CTCC 

761 GATATCCAATTCGAACAGTTTCCCAATTAACAAGAGAAAT 800 
CC C T C T G C T C 

801 TTATACAAACCCAGTATTAGAAAATTTTGATGGTAGTTTT 840 
C T TCT G C C C C C 

841 CGAGGCTC GGC TC AGGGC AT AG AAAG AAGTATT AGGAGTC 880 
T T T C A T C G CTCC C C 

8 8 1 C ACATTTG ATGG AT ATACTTAAC AGT AT AAC C ATCTAT AC 920 
C C CT G C T C 

921 GG ATGCTC AT AGGGGTTATTATTATTGGTC AGGGC ATCAA 960 
C C A AG G C T A C G 

961 ATAATGGCTTCTCCTGTAGGGTTTTCGGGGCCAGAATTCA 1000 
C C A T A CAGC C G T 

1001 CTTTTCCGCTATATGGAACTATGGGAAATGCAGCTCCACA 1040 
C T C C C 

1041 ACAACGTATTGTTGCTCAACTAGGTCAGGGCGTGTATAGA 1080 
C T C C 

1081 ACATTATCGTCCACTTTATATAGAAGACCTTTTAATATAG 1120 
C G T C G C C C C 

1121 GGATAAATAATCAACAACTATCTGTTCTTGACGGGACAGA 1160 
T C C C G T C ' A 

1161 ATTTGCTTATGGAACCTCCTCAAATTTGCCATCCGCTGTA 1200 
G C C T T C T 

1201 TACAGAAAAAGCGGAACGGTAGATTCGCTGGATGAAATAC 1240 
G C T CT C C 

1241 CGCCACAGAATAACAACGTGCCACCTAGGCAAGGATTTAG 1280 
A C T C CTC 

1281 TC ATCG ATTAAGCC ATGTTTC AATGTTTCGTTC AGGCTTT 1320 
C CA G 6 C G C C C A C 

1321 AGTAATAGTAGTGTAAGTATAATAAGAGCTCCTATGTTCT 1360 
C C TCC G C C C 

1361 CTTGGATACATCGTAGTGCTGAATTTAATAATATAATTGC 1400 
C G C C C C C 



TIGTOE 4B 
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1401 ATCGGATAGTATTACTCAAATCCCTGCAGTGAAGGGAAAC 1440 

■ C ' 

1441 TTTCTTTTTAATGGTTCTGTAATTTCAGGACCAGGATTTA 1480 

c c c c c 

1481 CTGGTGGGGACTTAGTTAGATTAAATAGTAGTGGAAATAA 1520 
A C C C C C C 

1521 CATTC AGAATAG AGGGT ATATTGAAGTTCCAATTC ACTTC 1560 



1561 CC ATCG AC ATCTACCAG ATATCGAGTTCGTGT ACGGTATG 1600 
• C A GA 

1601 CTTCTGTAACCCCG ATTCACCTCAACGTTAATTGGGGTAA 1640 
G T 

1641 TTCATCCATTTTTTCCAATACAGTACCAGCTACAGCTACG 1 680 
C C T C 

1681 TCATTAGATAATCTACAATCAAGTGATTTTGGTTATTTTG i720 
C G C C C C C 

1721 AAAGTGCCAATGCTTTTACATCTTCATTAGGTAATATAGT 1760 

c c c c 

1761 AGGTGTTAGAAATTTTAGTGGGACTGCAGGAGTGATAATA 1800 
G C T C 

1801 GACAGATTTGAATTTATTCCAGTTACTGCAACACTCGAGG 1840 
C G C 

1841 CTGAATATAATCTGGAAAGAGCGCAGAAGGCGGTGAATGC 1880 

A TGCG 

1881 GCTGTTTACGTCTACAAACCAACTAGGGCTAAAAACAAAT 1920 
CTGT ACGTCTACA C AGCT G ACTC G CA TG 

1921 G 1921 



FIGURE 4C 
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1. GAAAGAATAGAAACTGGTTACACCCCAATCGATATTTCCT 40 
ATGGCC T C T C C C 

41 TGTCGCTAACGCAATTTCTTTTGAGTGAATTTGTTCCCGG .80 
CI 6 X G GC C C G C G A 

81 TGCTGGATTTGTGTTAGGACTAGTTGATATAATATGGGGA .120 . 
6 C TC C C C C T 

121 ATTTTTGGTCCCTCTCAATGGGACGCATTTCTTGTACAAA 160 
C A T C G G 

161 TTGAACAGTTAATTAACCAAAGAATAGAAGAATTCGCTAG 200 
G G- C G G C G C 

201 GAACCAAGCCATTTCTAGATTAGAAGGACTAAGCAATCTT 240. 
G C G G T G C 

241 TATCAAATTTACGCAGAATCTTTTAGAGAGTGGGAAGCAG .28.0 
C C T GAGC C ' C 

281 ATCCTACTAATCCAGCATTAAGAGAAGAGATGCGTATTCA 320 
C TC CC C G A 

321 ATTCAAT G AC AT G AAC AGTGC CC TTAC AAC CGCT ATTCCT 360 
C C T G C A C A 

361 CTTTTTGCAGTTCAAAATTATCAAGTTCCTCTTTTATCAG . 400 
T G C C G C C C G C 

401 TATATGTTCAAGCTGCAAATTTACATTTATCAGTTTTGAG 440 
G C A T C T CC CAGC GC TC 

441 AGATGTTTCAGTGTTTGGACAAAGGTGGGGATTTGATGCC 480' 
C AGC . G C T 

481 GCGACTATCAATAGTCGTTATAATGATTTAACTAGGCTTA 520 
AC C C C CC T G 

521 TTGGCAACTATACAGATTATGCTGTACGCTGGTACAATAC 560 
A C C 'CC C T T C 

561 GGGATTAGAACGTGTATGGGGACCGGATTCTAGAGATTGG 600 
T C 6 G C T T 

601 GTAAGGTATAATCAATTTAGAAGAGAATTAACACTAACTG 640 
AT ACC GC G GCCA 

641 T ATTAG AT ATC GTT GC TC TGTTCC C G AATT AT GATAGT AG 680 
T G C T GT C C CTCC 
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: 681 AAGATATCCAATTCGAACAGTTTCCCAATTAACAAGAGAA 720 
CC C T C T G C T C 

721 ATTTATACAAACCCAGTATTAGAAAATTTTGATGGTAGTT 760 
C T TC T G C C C C 

761 TTCGAGGCTCGGCTCAGGGCATAGAAAGAAGTATTAGGAG 800 
C TT TCAT.C G CTCC C 

801 TCCACATTTGATGGATATACTTAACAGTATAACCATCTAT B40 
C C C CI G C T C 

841 ACGGATGCTCATAGGGGTTATTATTATTGGTCAGGGCATC 880 
C C A AG G C T A C 

881 AAATAATGGCTTCTCCTGTAGGGTTTTCGGGGCCAGAATT 920 
G C C A T A CAGC C G 

921 CACTTTTCCGCTATATGGAACTATGGGAAATGCAGCTCCA 960 
T C T C C C 

.961 CAACAACGTATTGTTGCTCAACTAGGTCAGGGCGTGTATA 1000 
C . T C C 

1001 G AAC ATTATC GTCC ACTTTAT ATAG AAGAC CTTTT AAT AT 1040 
C G T C G C C C 

1041 AGGGATAAATAATCAACAACTATCTGTTCTTGACGGGACA 1080 
CTCCC G TC A 

1081 GAATTTGCTTATGGAACCTCCTCAAATTTGCCATCCGCTG 1120 
G C C T T C 

1121 TATACAGAAAAAGCGGAACGGTAGATTCGCTGGATGAAAT 1160 
T G C T CT C 

1161 ACCGCCACAGAATAACAACGTGCCACCTAGGCAAGGATTT 1200 
C A C T C C 

1201 AGTCATCGATTAAGCCATGTTTCAATGTTTCGTTCAGGCT 1240 
TCC CA G G C G C CCA 

1241 TTAGTAATAGTAGTGTAAGTATAATAAGAGCTCCTATGTT 1280 
C C C TCC G C C C 

1281 CTCTTGG AT AC ATC GT AG TGCTG AATTTAAT AATAT AATT 1320 
C GC CCCC 

1321 GCATCGGATAGTATTACTCAAATCCCTGCAGTGAAGGGAA 1360 
C 

1361 ACTTTCTTTTTAATGGTTCTGTAATTTCAGGACCAGGATT 1400 
C C C c 
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1401 TACTGGTGGGGACTTAGTTAGATTAAATAGTAGTGGAAAT 1440- 
C A C C C C C C 

1441 AACATTCAGAATAGAGGGTATATTGAAGTTCCAATTCACT 1480 



1481 TCCCATCGACATCTACCAGATATCGAGTTCGTGTACGGTA 1520. , 
C A . GA 

1521 TGCTTCTGTAACCCCGATTCACCTCAACGTTAATTGGGGT 1560. ...... 

6 T 

1561 AATTCATCCATTTTTTCCAATACAGTACCAGCTACAGCTA 1600 
C C T . . 

1601 CGTCATTAGATAATCTACAATCAAGTGATTTTGGTTATTT 1640 
C C G C C C C C 

1641 TGAAAGTGCCAATGCTTTTACATCTTCATTAGGTAATATA 1680 

c c c c 

1681 GTAGGTGTTAGAAATTTTAGTGGGACTGCAGGAGTGATAA 1720. 
G. C T 

1721 TAGACAGATTTGAATTTATTCCAGTTACTGCAACACTCGA 17 60 ' 
C C G C 

1761 GGCTGAA 1767 
G 
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1 ATGGATAACAATCCGAACATCAATGAATGCATTCCTTATA . 40 
C C A C AC 

41 ATTG TTT AAGT AAC C C T G AAGT AG AAGT ATT AGG TGG AGA 80 
C C G A T ; C T 

81 AAG AAT AG AAAC TGGTT AC ACC C C AATC G AT ATTTC CTTG 120 
C C T C T C C C 

121 TCGCTAACGCAATTTCTTTTGAGTGAATTTGTTCCCGGTG 1 60 
CT G A G GC C C G C G A 

161 CTGGATTTGTGTTAGGACTAGTTGATATAATATGGGGAAT 200 
GC TCC CCC T 

201 TTTTGGTCCCTCTCAATGGGACGCATTTCTTGTACAAATT . 240 
C A T C G G 

241 GAACAGTTAATTAACCAAAGAATAGAAGAATTCGCTAGGA 280 
G G C G G C . G C 

281 ACCAAGCCATTTCTAGATTAGAAGGACTAAGCAATCTTTA 320 
G C G G T G C 

321 TCAAATTTACGC AGAATCTTTTAGAGAGTGGGAAGCAGAT 360 
C C T GAGC C C 

361 CCTACTAATC C AGC ATT AAG AG AAG AG ATGCG TATTCAAT 400 
C TC CC C G A 

401 TCAATGACATGAACAGTGCCCTTACAACCGCTATTCCTCT 440 
C C T G C A C AT 

441 TTTTGCAGTTCAAAATTATCAAGTTCCTCTTTTATCAGTA 480 
G C C G C C C G C G 

481 TATGTTCAAGCTGCAAATTTACATTTATCAGTTTTGAGAG 520 
C A T C T CC CAGC GC TC 

521 ATGTTTCAGTGTTTGGACAAAGGTGGGGATTTGATGCCGC 560 
C AGC G C T 

561 GACTATCAATAGTCGTTATAATGATTTAACTAGGCTTATT 600 
AC C C C CC T G 

. 601 GGC AAC TATAC AG ATT ATGC TG T ACGCTGG TAC AAT ACGG 640 
A C CCCC TT CT 

641 GATTAGAACGTGTATGGGGACCGGATTCTAGAGATTGGGT 680 
C G G C T T A 
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681 AAGGTATAATCAATTTAGAAGAGAATTAACACTAACTGTA 720 - ■/' 
T A C C G C G G C C A T 

721 TTAG ATATC GTTGCTCTGTTCCC G AATTATGATAGT AGAA 760 
G C T GT C C CTCC 

761 GATATCCAATTCGAACAGTTTCCCAATTAACAAGAGAAAT 800 
CC C T C T G C T C 

. 801 TTATACAAACCCAGTATTAGAAAATTTTGATGGTAGTTTT 840 
C T TC T G C C C C C 

841 CGAGGCTCGGCTC AGGGCATAGAAAGAAGTATTAGGAGTC 880.. 
TTTC A TC G CTCC C C 

881 CACATTTG ATG GATATACTTAAC AGTATAACC ATC TAT AC 920: . 
C C; CT G C T C 

921 GGATGCTCATAGGGGTTATTATTATTGGTCAGGGCATCAA 960 ' 
C C A AG G C T A C G 

961 ATAATGGCTTCTCCTGTAGGGTTTTCGGGGCCAGAATTCA 1000 
C C A T A CAGC C G T 

1001 CTTTTCCGCTATATGGAACTATGGGAAATGCAGCTCCACA 1040 
C T C C C 

1041 ACAACGTATTGTTGCTCAACTAGGTCAGGGCGTGTATAGA 1080 
C T C C 

1081 ACATTATCGTCCACTTTATATAGAAGACCTTTTAATATAG 1120. 
C G T C G C C C C 

1121 GGATAAATAATCAACAACTATCTGTTCTTGACGGGACAGA 1160 
T C C C G T C A 

1161 ATTTGCTTATGGAACCTCCTCAAATTTGCCATCCGCTGTA 1200 
G C C T T C T 

1201 TACAGAAAAAGCGGAACGGTAGATTCGCTGGATGAAATAC 'l240 
G C T CT C C 

1241 CGCCACAGAATAACAACGTGCCACCTAGGCAAGGATTTAG 1280 . 
A C T C CTC 

1281 TCATCGATTAAGCCATGTTTCAATGTTTCGTTCAGGCTTT 1320 
C CA G G C G C C C A C 

1321 AGTAATAGTAGTGTAAGTATAATAAGAGCTCCTATGTTCT 1360 
C C TCC G C C C 

1361 CTTGGATACATCGTAGTGCTGAATTTAATAATATAATTGC 1400 
C G C C C C C 
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1401 ATCGGATAGTATTACTCAAATCCCTGCAGTGAAGGGAAAC 1440 

C •• • ■ 

1441 TTTCTTTTTAATGGTTCTGT AATTTC AGGACCAGG ATTTA 1480 

c c c c c 

1481 ' CTGGTGGGGACTTAGTTAGATTAAATAGTAGTGGAAATAA 1520 
A C C C C C C 

1521 CATTCAGAATAGAGGGTATATTGAAGTTCCAATTCACTTC 1560 



1561 CCATCGACATCTACCAGATATCGAGTTCGTGTACGGTATG 1600 
C . A GA 

.1601 CTTCTGTAACCCCGATTCACCTCAACGTTAATTGGGGTAA 1640 
G T 

1641 TTCATCCATTTTTTCCAATACAGTACCAGCTACAGCTACG 1680 
C C T C 

1681 TCATT AGAT AATCTACAATCAAGTGATTTTGGTTATTTTG 1720 
C G C C C C C 

1721 AAAGTGC C AAT GCTTTTAC ATCT T C ATTAGGT AAT ATAGT 1760 

c c c c 

1761 AGGTGTTAGAAATTTTAGTGGGACTGCAGGAGTGATAATA 1800 
G C T C 

1801 GACAGATTTG AATTTATTCCAGTTACTGC AACACTCGAGG 1840 
C 6 C ' 

1841 CTGAATATAATCTGGAAAGAGCGCAGAAGGCGGTGAATGC 1880 

1881 GCTGTTTACGTCTACAAACCAACTAGGGCTAAAAACAAAT 1920 

1921 GTAACGGATTATCATATTGATCAAGTGTCCAATTTAGTTA I960 

1961 CGTATTTATCGGATGAATTTTGTCTGGATGAAAAGCGAGA 2000 

2001 ATTGTCCGAGAAAGTCAAACATGCGAAGCGACTCAGTGAT 2040 

2041 GAACGCAATTTACTCCAAGATTCAAATTTCAAAGACATTA 2080 

2081 ATAGGCAACCAGAACGTGGGTGGGGCGGAAGTACAGGGAT 2120 
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2121 TACC ATCC AAGGAGGGG ATGACGTATTTAAAGAAAATTAC 2160 

2161 GTCACACTATCAGGTACCTTTGATGAGTGCTATCCAACAT 2200 

2201 ATTTGTATCAAAAAATCGATGAATCAAAATTAAAAGCCTT 2240 . 

2241 T AC C C GTT ATC AATT AAGAGGGT ATATCG AAGAT AGTC AA 2280 

2281 GACTTAGAAATCTATTTAATTCGCTACAATGCAAAACATG 2320. 

2321 AAACAGTAAATGTGCCAGGTACpGGTTCCTTATGGCCGCT 2360 

2361 TTCAGCCCAAAGTCCAATCGGAAAGTGTGGAGAGCCGAAT 2400 

2401 CGATGCGCGCCACACCTTGAATGGAATCCTGACTTAGATT 24 40 

2441 GTTCGTGTAGGGATGGAGAAAAGTGTGCCCATCATTCGCA 2480 

2481 TCATTTCTCCTTAGACATTGATGTAGGATGTACAGACTTA 2520 

2521 AATGAGGACCTAGGTGTATGGGTGATCTTTAAGATTAAGA 2560 

2561 CGCAAGATGGGCACGCAAGACTAGGGAATCTAGAGTTTCT 26C0 

2601 CGAAGAGAAACCATTAGTAGGAGAAGCGCTAGCTCGTGTG 2640 . 

2641 AAAAGAGCGGAGAAAAAATGGAGAGACAAACGTGAAAAAT * 2680 

2681 TGGAATGGGAAACAAATATCGTTTATAAAGAGGCAAAAGA 2720 

2721 ATCTGTAGATGCTTTATTTGTAAACTCTCAATATGATCAA 2760 

2761 TTACAAGCGGATACGAATATTGCCATGATTCATGCGGCAG 2800 

2801 ATAAACGTGTTCATAGCATTCGAGAAGCTTATCTGCCTGA 2840 
FIGURE 9D 
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2841. GCTGTCTGTGATTCCGGGTGTCAATGCGGCTATTTTTGAA 2880 

2881 GAATTAGAAGGGCGTATTTTCACTGCATTCTCCCTATATG 2920 

2921 ATGCGAGAAATGTCATTAAAAATGGTGATTTTAATAATGG 2960 

2961 CTTATCCTGCT GG AACGTGAAAGGGCATGTAGATGTAGAA 3000 

3041 GGGAAGCAGAAGTGTCACAAGAAGTTCGTGTCTGTCCGGG 3080 

3081 TCGTGGCTATATCCTTCGTGTCACAGCGTACAAGGAGGGA. 3120 

3121 . TATGGAGAAGGTTGCGTAACCATTCATGAGATCGAGAACA 3160 

3161 AT AC AG ACG AACTG AAG TTTAGC AACTGC GT AG AAGAGGA 3200 

3201 • AATCT ATCC AAAT AAC AC GGT AAC GTGTAATG ATT ATACT 3240 

3241 GTAAATCAAGAAGAATACGGAGGTGCGTACACTTCTCGTA 3280 

3281 ATCGAGGATATAACGAAGCTCCTTCCGTACCAGCTGATTA 3320 

3321 TGCGTCAGTCTATGAAGAAAAATCGTATACAGATGGACGA 3360 

3361 AG AGAGAATCCTTGTG AATTTAACAGAGGGTATAGGGATT .3400 

3401 ACACGCCACTACCAGTTGGTTATGTGACAAAAGAATTAGA 3440 

3441 ATACTTCCCAG AAACCGATAAGGTATGGATTGAGATTGGA 3480 

3481 GAAACGGAAGG AACATTTATCGTGGACAGCGTGGAATTAC 3520 
3521 TCCTTATGGAGGAA 3534 
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1 ATGGATAACAATCCGAACATCAATGAATGCATTCCTTATA 40 
C C A • . C , A C 

41 ATTGTTTAAGTAACCCTGAAGTAGAAGTATTAGGTGGAGA 80 
C C G A T C T 

81 AAG AAT AG AAACTGGTT AC AC CC C AATCG AT ATTT C CTTG . 120 
C C T C T C C C 

121 TCGCTAAC GC AATTTCTTTTG AG TGAATTTGTTCCCGGTG 160 
CT G AG GC C C G C G A 

161 CTGGATTTGTGTTAGG ACTAGTTGATATAATA^TGGGGAAT 200.. 
G C TC C C C C T 

201 TTTTGGTCCCTCTCAATGGGACGCATTTCTTGTACAAATT 240/- 
C A T ' C G G : 

241 GAACAGTTAATTAACCAAAGAATAGAAGAATTCGCTAGGA 280 . 
G G C G G C G C 

281 ACCAAGCCATTTCTAGATTAGAAGGACTAAGCAATCTtTA 320 
G C G G T G C 

321 TCAAATTTACGCAGAATCTTTTAGAGAGTGGGAAGCAGAT 360 
C C T GAGC C C 

361 CCTACTAATCCAGCATTAAGAGAAGAGATGCGTATTCAAT . 400 . : 
C TC CC C G A 

401 TC AATG AC ATG AAC AGTGCCCTT AC AACCGCTATTCCTCT 440: . 
C C T G C A C AT 

441 TTTTGCAGTTCAAAATTATCAAGTTCCTCTTTTATCAGTA 480 
G C C G C C C G C G 

481 TATGTTCAAGCTGCAAATTTACATTTATCAGTTTTGAGAG 520 
C A T C T CC CAGC GC TC 

521 . ATGTTTCAGTGTTTGGACAAAGGTGGGGATTTGATGCCGC . 560 ; 
C AGC G C T 

5 61 GACTATCAAT AGTCGTT AT AATG ATTT AACTAGGCTT ATT 600 
A C C C C CC T G 

601 GGCAACTATACAGATTATGCTGTACGCTGGTACAATACGG 640 
A C C CC C T T C T. 
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641 GATTAGAACGTGTATGGGGACCGGATTCTAGAGATTGGGT 680 
C G G C T T A 

681 AAGGTATAATCAATTTAGAAGAGAATTAACACTAACTGTA 720 
T A C C G C G G C C A T 

721 TTAGATATCGTTGCTCTGTTCCCGAATTATGATAGTAGAA 7 60 
G C T GT , C C CTCC 

761 GATATCCAATTCGAACAGTTTCCCAATTAACAAGAGAAAT 800 
CC C T C T G C T C 

801 TTATACAAACCCAGTATTAGAAAATTTTGATGGTAGTTTT 840 
C T TC T 6 C C C . C C 

841 C G AG GCTC GGCTC AGGG CATAGAAAGAAGT ATTAGGAGTC 880 
T T T C A T C G CTCC C C 

881 C AC ATT TG AT GG AT AT ACTTAAC AGTAT AACC ATCT AT AC 920 
C C CT G C T C 

921 GGATGCTCATAGGGGTTATTATTATTGGTCAGGGCATCAA .960 
C C A AG G C T A C G 

961 ATAATGGCTTCTCCTGTAGGGTTTTCGGGGCCAGAATTCA 1000 
C C ATA CAGC C G T 

1001 CTTTTCCGCTATATGGAACTATGGGAAATGCAGCTCCACA 1040 
C T C C C 

1041 ACAACGTATTGTTGCTCAACTAGGTCAGGGCGTGTATAGA 1080 
C T C C 

1081 AC ATT ATCGTCC ACTTT ATATAG AAGACC TTTT AATAT AG 1120 
CG.T C GC CC C 

1121 GGATAAATAATC AACAACTATCTGTTCTTGACGGGACAGA 1 1 60 
TC CCG TC A 

1161 ATTTGCTTATGGAACCTCCTCAAATTTGCCATCCGCTGTA 1200 
G C C T T C T 

1201 TACAG AAAAAGC GG AAC GGTAGATTCGCT GG ATGAAATAC 1240 
G C T CT C C 

1241 CGCCACAGAATAACAACGTGCCACCTAGGCAAGGATTTAG 1280 
A C T C CTC 

1281 TCATCGATTAAGCCATGTTTCAATGTTTCGTTCAGGCTTT 1320 
CCAGG CGC C CAC 

1321 AGTAATAGTAGTGTAAGTATAATAAGAGCTCCTATGTTCT 1 3 60 
C C TCC G C C C 
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1361 CTTGGATACATCGTAGTGCTGAATTTAATAATATAATTGC .. 1400 
C G C C C C C 

1401 ATCGGATAGTATT ACTC AAATCCCTGCAGTGAAGGGAAAC 1 440 
C 

1441 TTTCTTTTTAATGGTTCTGTAATTTCAGG ACC AGGATTTA 1480 

c c c c c 

1481 CTGGTGGGGACTTAGTTAGATTAAATAGTAGTGGAAATAA 1520 
A C C C C C C 

1521 CATTCAGAATAGAGGGTATATTGAAGTTCCAATTCACTTC 1560 



1561 CCATCGACATCTACCAGATATCGAGTTCGTGTACGGTATG 1600 
C A GA 

1601 CTTCTGTAACCCCGATTCACCTCAACGTTAATTGGGGTAA 1640 

G ••■ T 

1641- TTCATCCATTTTTTCCAATACAGTACCAGCTACAGCTACG 1680 
C C . T C 

1681 TCATTAGATAATCTAC AATCAAGTGATTTTGGTTATTTTG 1720 
C G C C C C C 

1721 AAAGTGCCAATGCTTTTACATCTTCATTAGGTAATATAGT 1760 

c c c c 

1761 AGGTGTTAGAAATTTTAGTGGGACTGCAGGAGTGATAATA 1800 
G C T C 

1801 GACAGATTTGAATTTATTCCAGTTACTGCAACACTCGAGG 1840 
C G C 

1841 CTGAATATAATCTGGAAAGAGCGCAGAAGGCGGTGAATGC 1880 



1881 GCTGTTTACGTCTACAAACCAACTAGGGCTAAAAACAAAT 1920 

G C C C G C 

1 921 GTAACGGATTATCATATTGATCAAGTGTCCAATTTAGTTA 1960 
G C G G 

1961 CGTATTTATCGGATGAATTTTGTCTGGATGAAAAGCGAGA 2000 
C CC CAGC G C 

2001 ATTGTCCGAGAAAGTCAAACATGCGAAGCGACTCAGTGAT 2040 
204 1 GAACGCAATTTACTCCAAGATTCAAATTTCAAAGACATTA 2080 
FIGOHS 10C 
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2081 ATAGGCAACCAGAACGTGGGTGGGGCGGAAGTACAGGGAT 2120 



2121 TACCATCCAAGGAGGGGATGACGTATTTAAAGAAAATTAC . 2160 
G T C G C G G C 

2161 GTCACACTATCAGGTACCTTTGATGAGTGCTATCCAACAT 2200 

2201 ATTTGT ATCAAAAAATC GATGAATCAAAATTAAAAGCCTT 2240 
CC C C G G C G C G G 

2241 TACCCGTTAJC AATTAAGAGGGTATATCG AAGATAGTC AA .2280 



2281 GACTTAGAAATCTATTTAATTCGCTACAATGCAAAACATG 2320 
. C C Q CC C C 

2321- AAACAGTAAATGTGCCAGGTACGGGTTCCTTATGGCCGCT 23 60 

2361 TTCAGCCCAAAGTCCAATCGGAAAGTGTGGAGAGCCGAAT 2400 

2401 C GATG C GCGCC AC ACCTTGAATGGAATCC TG AC TTAG ATT 2440 

2441 GTTCGTGTAGGGATGGAGAAAAGTGTGCCCATCATTCGCA . 2480 

2481 TCATTTCTCCT TAGAC ATTG ATG TAGGATGTAC AG ACTTA 2520 

2 521 AATGAGGACCTAGGTGTATGGGTGATCTTTAAGATTAAGA 2560 

2561 CGCAAGATGGGCACGCAAGACTAGGGAATCTAGAGTTTCT 2600 

2601 CGAAGAGAAAC CAT T AGTAGGAG AAGCGCTAGCTCGTGTG 2640 

2641 AAAAG AGC GG AG AAAAAATGGAG AG ACAAACGTGAAAAAT 2680 

G G 

2 681 TGG AATGGG AAAC AAATATCGTTTAT AAAG AGGCAAAAGA 2720 
G C C C C 

2721 ATCTG TAG ATG CTTT ATTTGT AAAC TCTCAATATGATCAA 2760 

27 61 TTACAAGCGGATACGAATATTGCCATGATTCATGCGGCAG 2800 

FIGURE 10D 
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2801 ATAAACGTGTTCATAGCATTCGAGAAGCTTATCTGCCTGA 2840 

2841 GCTGTCTGTGATTCCGGGTGTCAATGCGGCTATTTTTGAA 2880 

2881 GAATTAGAAGGGCGTATTTTCACTGCATTCTCCCTATATG 2920 

' . ■. . C C 

2921 ATGCGAGAAATGTCATTAAAAATGGTGATTTTAATAATGG 2960 
C . C C 6 C C C C 

2961 CTTATCCTGCTGGAACGTGAAAGGGCATGTAGATGTAGAA 3000 

3001 GAACAAAACAACCAACGTTCGGTCCTTGTTGTTCCGGAAT 3040 

3041 GGGAAGCAGAAGTGTCACAAGAAGTTCGTGTCTGTCCGGG 3080 

3081 TCGTGGCTATATCCTTCGTGTCACAGCGTACAAGGAGGGA 3120 

3121 TATGGAGAAGGTTGCGTAACCATTCATGAGATCGAGAACA 3160 

3161 ATACAGACGAACTGAAGTTTAGCAACTGCGTAGAAGAGGA 3200 

3201 AATCT ATCCAAAT AAC ACGGT AACGTGTAATGATT ATACT 3240 

3241 GTAAATCAAGAAGAATACGGAGGTGCGTACACTTCTCGTA 3280 

3281 ATCGAGGATATAACGAAGCTCCTTCCGTACCAGCTGATTA 3320 

3321 . TGCGTCAGTCTATGAAGAAAAATCGTATACAGATGGACGA 3360 

3361 AGAGAGAATCCTTGTGAATTTAACAGAGGGTATAGGGATT 3400 

3401 ACACGCCACTACCAGTTGGTTATGTGACAAAAGAATTAGA 3440 

3441. ATACTTCCCAG AAACCGATAAGGTATGGATTGAG ATTGGA 3480 

3481 G AAAC GG AAG G AAC ATTTATC GT G G AC AG CGTGG AATT AC 3520 



3521 TCCTTATGGAGGAA 3534 
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i. ATGGATAACAATCCGAACATCAATGAATGCATTCCTTATA 40 
C C A C A C 

41 ATTGTTTAAGTAACCCTGAAGTAGAAGTATTAGGTGGAGA 80 ' 
C C G A T C T 

81 AAGAATAGAAACTGGTTACACCCCAATCGATATTTCCTTG 120 
C C T CTC C C 

121 TCGCTAACGCAATTTCTTTTGAGTGAATTTGTTCCCGGTG i60 
CT GAG GC C C G C G A • 

161 CTGGATTTGTGTTAGGACTAGTTGATATAATATGGGGAAT' 200 
G C TC C C C C T 

201 TTTTGGTCCCTCTCAATGGGACGCATTTCTTGTACAAATT 240 
C A T C G G 

241 GAAC AG TTAATT AACC AAAG AATAG AAG AATTCGCTAGGA 280 
G G C G G C G C 

281 ACCAAGCCATTTCTAGATTAGAAGGACTAAGCAATCTTTA 320 
G C G G T G C 

321 TCAAATTTACGCAGAATCTTTTAGAGAGTGGGAAGCAGAT 360 
C C T GAGC ' C C 

361 CCTACTAATCCAGCATTAAGAGAAGAGATGCGTATTCAAT 400. 
C TC CC C G • A 

401 TCAATGACATGAACAGTGCCCTTACAACCGCTATTCCTCT 4 40 
C C T G C A C AT 

4 41 TTTTGCAGTTCAAAATTATCAAGTTCCTCTTTTATCAGTA .480 
G C C G C C C G C G 

481 TATGTTCAAGCTGCAAATTTACATTTATCAGTTTTGAGAG 520 
C AT C T CC CAGC GC TC 

521 ATGTTT C AGT GTTTGG AC AAAGGT GGG G ATTTG ATGCCGC 5 60 
C AGC G C T 

561 GACTATCAATAGTCGTTATAATGATTTAACTAGGCTTATT 600 
A C C CCCCT G 

601 GGCAACTATACAGATTATGCTGTACGCTGGTACAATACGG 640 
A C C CC C T T C T 

641 GATTAGAACGTGTATGGGGACCGGATTCTAGAGATTGGGT 680 
C G G C T T A 
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681 AAGGTATAATCAATTTAGAAGAGAATTAAC^CTAACTGTA 720 
T A C C G C G G C CAT 

721 TTAGATATCGTTGCTCTGTTCCCGAATTATGATAGTAGAA 760 
. G C T '61 C C CTCC 

761 G ATATCC AATTC G AAC AGTTTC CC AATTAACAAGAG AAAT 800 
CC C T C T G C T C 

801 TTATACAAACCCAGTATTAGAAAATTTTGATGGTAGTTTT 840 
C T TC T G C C C C C 

841 CGAGGCTCGGCTCAGGGCATAGAAAGAAGTATTAGGAGTC 880 
T T T C A T C G CTCC C C 

881 CACATTTGATGGATATACTTAACAGTATAACCATCTATAC 920 
C C CT G C T C 

921 GG AT GCTC AT AGGGGTT ATT ATT ATTGGTCAGGGCATCAA 960 
'' C.. C A AG G C T A C G 

961 ATAATGGCTTCTCCTGTAGGGTTTTCGGGGCCAGAATTCA 1000 
C C ATA CAGC C G T 

1001 CTTTTCCGCTATATGGAACTATGGGAAATGCAGCTCCACA 1040 
C T C C C 

1041 ACAACGTATTGTTGCTCAACTAGGTCAGGGCGTGTATAGA 1080 
C T C C 

1081 ACATTATCGTCCACTTTATATAGAAGACCTTTTAATATAG 1120 
C G T C G C C C C 

1121 GGATAAATAATCAACAACTATCTGTTCTTGACGGGACAGA 1160 
T C C C G T C A 

1161 ATTTGCTTATGGAACCTCCTCAAATTTGCCATCCGCTGTA 1200 
6 C C T T C T 

1201 TACAGAAAAAGCGGAACG GTAG AT TC GCTG GATG AAAT AC 1240 
G C T CT C C 

1241 CGCC AC AG AAT AAC AAC G T GCC AC CT AGGC AAGG ATTTAG 1280 
A C T C CTC 

1281 TCATCGATTAAGCCATGTTTCAATGTTTCGTTCAGGCTTT 1320 
C C A G G C G C C C A C 

1321 AGTAATAGTAGTGTAAGTATAATAAGAGCTCCTATGTTCT 1360 
G C TCC G C C C 

1361 CTTGGATACATCGTAGTGCTGAATTTAATAATATAATTGC 1400 
C 6 C C C C C 
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1401 ATCGGATAGTATTACTCAAATCCCTGCAGTGAAGGGAAAC 1440 

c 

1441 TTTCTTTTTAATGGTTCTGTAATTTCAGGACCAGGATTTA- .1480 

c c c c c 

1481 CTGGTGGGG ACTTAGTT AGATTAAATAGTAGTGGAAATAA 1520 - 
A C C C C C C 

1521 CATTCAGAATAGAGGGTATATTGAAGTTCCAATTCACTTC 1560 



1561 CCATCGAC ATCTACCAG ATATCGAGTTCGTGTACGGTATG 1600 
C A GA 

1601 CTTCTGTAACCCCGATTCACCTCAACGTTAATTGGGGTAA 1640 
G T 

1641 TTCATCCATTTITTCCAATACAGTACCAGCTACAGCTACG 1680 

C C ' T C •.. 

1681 TCATTAGATAATCTACAATCAAGTGATTTTGGTTATTTTG 1720 
C G C C. C C C 

1721 AAAGTGCCAATGCTTTTACATCTTCATTAGGTAATATAGT 1760 

c c c c 

1761 AGGTGTTAGAAATTTTAGTGGGACTGCAGGAGTGATAATA 1800 
G C T C 

1801 GACAGATTTGAATTTATTCCAGTTACTGCAACACTCGAGG 1840 
C G C 

1841 CTGAATATAATCTGGAAAGAGCGCAGAAGGCGGTGAATGC 1880 
G C C T G C T C 

1881 GCTGTTTACGTCTACAAACCAACTAGGGCTAAAAACAAAT 1920 * 
C C CCCTGTCTG T C 

1921 GTAACGGATTATCATATTGATCAAGTGTCCAATTTAGTTA 1960 
T T C C C C G C 

1961- CGTATTTATCGGATGAATTTTGTCTGGATGAAAAGCGAGA. 2000 
C CC TAGC G C C C C G T 

2001 ATTGTCCGAGAAAGTCAAACATGCGAAGCGACTCAGTGAT. 2040 
CC T CC T CC 

2041 GAACGCAATTTACTCCAAGATTCAAATTTCAAAGACATTA 2080 
GA G C CT G C C C C 

2081 ATAGGCAACCAGAACGTGGGTGGGGCGGAAGTACAGGGAT 2120 
C G T T C C 
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2121 TACCATCCAAGGAGGGGATGACGTATTTAAAGAAAATTAC 2160 
. C C C T G C G G C 

2161 GTCACACTATCAGGTACCTTTGATGAGTGCTATCCAACAT 2200 
C C C A T C C C T C 

2201 ATTTGT ATC AAAAAATCG ATGAATC AAAATTAAAAGCCTT 
C C G G G C C C 

224 1 TACC C G TTATCAATT AAG AGGGTAT ATCG AAG AT AGTC AA 
; C AG C T C C C C 

2281 GACTTAGAAATCTATTTAATTCGCTACAATGCAAAACATG 2320 
C T . C CG C A G C G C 

2321 AAACAGTAAATGTGCCAGGTACGGGTTCCTTATGGCCGCT 2360 
G C G C : T C C A 

2361 TTCAGCCCAAAGTCCAATCGGAAAGTGTGGAGAGCCGAAT. 2400 
T TC C T G T C 

2401. CGATGCGCGCCACACCTTGAATGGAATCCTGACTTAGATT 2440 
A T G G C 

2441 GTT C GT GTAGGG ATGGAG AAAAGT GTGCCC ATC ATT CGC A 2480 
C C C CGC T 

2481 TCATTTCTCCTTAGACATTGATGTAGGATGTACAGACTTA 2520 
C G C G T C G 

2521 AATGAGGACCTAGGTGTATGGGTGATCTTTAAGATTAAGA 2560 
C A C C C C 

2561 CGCAAGATGGGCACGCAAGACTAGGGAATCTAGAGTTTCT 2600 
CCA T C C T 

2601 CGAAGAGAAACCATTAGTAGGAGAAGCGCTAGCTCGTGTG 2640 
G C T T C 

2641* AAAAGAGCGGAGAAAAAATGGAGAGACAAACGTGAAAAAT 2680 
G A G G G G C 

2681 TGGAATGGGAAACAAATATCGTTTATAAAGAGGCAAAAGA 2720 
C T C CGC 

2721 ATCTGTAGATGCTTTATTTGTAAACTCTCAATATGATCAA 2760 
6 C G G C G C G 

2761 TTACAAGCGGATACGAATATTGCCATGATTCATGCGGCAG 2800 
G CCCCC CC C 

2801 ATAAACGTGTTCATAGCATTCGAGAAGCTTATCTGCCTGA 2840 
C G C T G CT 
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2841 GCTGTCTGTGATTCCGGGTGTCAATGCGGCTATTTTTGAA- 2880 
T C C T G C T C C C G 

2881 GAATTAGAAGGGCGTATTTTCACTGCATTCTCCCTATATG ■.. 2920 
C T G A C T C T G C 

2.921 ATGCGAGAAATGTCATTAAAAATGGTGATTTTAATAATGG , 2960 
C C C G C C C C 

2961 CTTATCCTGCTGGAACGTGAAAGGGCATGTAGATGTAGAA 3000 
C CAG T T G C G G 

3001 GAACAAAACAACCAACGTTCGGTCCTTGTTGTTCCGGAAT 3040 
G T G C G G T G 

3041 GGGAAGCAGAAGTGTCACAAGAAGTTCGTGTCTGTCCGGG . 3080 
T C G A A A 

3081 TCGTGGCTATATCCTTCGTGTCACAGCGTACAAGGAGGGA .3120 
A A C T C . G C T 

3121 TATGGAGAAGGTTGCGTAACCATTCATGAGATCGAGAACA 3160. 
C T G G C C 

3161 ATAC AG ACGAACTG AAGTTTAGCAACTGCGTAGAAG AGGA 3200 
C C G T CTC C G A 

3201 AATCTATCC AAAT AACAC GGTAAC GTGTAATGATTATACT 3240 
C C C T T C C C C ; 

3241 GTAAATC AAG AAG AATACGGAGGTGCGTAC ACTTCTCGT A ', 3280 
G G G C AGC 

3281 ATCGAGGATATAACGAAGCTCCTTCCGTACCAGCTGATTA 3320 
CA T C T T C 

3321 TGCGTCAGTCTATGAAGAAAAATCGTATACAGATGGACGA 3360 
C C G C G G C C CA 

3361 AG AG AG AAT C CT TGTG AATTT AAC AG AGGGT AT AGGG ATT 3400 
CTC C G C T C C 

3401 AC AC GC C ACT AC C AGTT GGTT ATGTG AC AAAAGAATT AGA 3440 
A T C T C G GC T 

3441 AT ACTTC CC AG AAAC C GAT AAG GT AT GG ATTG AG ATTGG A 3480 
G TTG C AG C CT 

3481 GAAACGGAAGGAACATTTATCGTGGACAGCGTGGAATTAC 3520 
C G C C GC T 

3521 TCCTTATGGAGGAA 3534 
T G 
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1' AT G AC TGC AG AT AAT AAT AC GG AAGC ACTAG AT AGC TCTA 40 
C "C C C C C C . T 

41. CAACAAAAGATGTCATTCAAAAAGGCATTTCCGTAGTAGG 80 
. C T G T C G G T C . T G 

81 TGATCTCCTAGGCGTAGTAGGTTTCCCGTTTGGTGGAGCG 120 
A C T G G T A T C C C 

. 121 CTTGTTTCGTTTTATACAAACTTTTTAAATACTATTTGGC 160 
C GAGC C C C C C 

161 CAAGTGAAGACCCGTGG AAGGCTTTTATGG AACAAGTAGA 200 
C G T A A C G T 

201 AGC ATTGATGGATCAGAAAATAGCTGATTATGCAAAAAAT 240 
TCT G T A C G C 

241 AAAGCTCTTGCAGAGTTACAGGGCCTTCAAAATAATGTCG 280 
G T G AC C G C G 

281 AAGATTATGTGAGTGCATTGAGTTCATGGCAAAAAAATCC 320 
G C C TCCAGC G G C 

321 TGTGAGTTCACGAAATCCACATAGCCAGGGGCGGATAAGA 360 
T C CA T C A TA C 

361 GAGCTGTTTTCTCAAGCAGAAAGTCATTTTCGTAATTCAA 400 
T C C TCC C CA A C 

401 TGCCTTCGTTTGCAATTTCTGGATACGAGGTTCTATTTCT 440 
AGC T C C T T C 

441 AACAACATATGCACAAGCTGCCAACACACATTTATTTTTA 480 
C T C T C C G C C 

481 CTAAAAGACGCTCAAATTTATGGAGAAGAATGGGGATACG 520 
T G C 6 

521 AAAAAGAAGATATTGCTGAATTTTATAAAAGACAACTAAA 560 
G G C G C C GC T T 

561 ACTTACGCAAGAATATACTGACCATTGTGTCAAATGGTAT 600 
G C C G C C G 

601 AATGTTGGATTAGATAAATTAAGAGGTTCATCTTATGAAT 540 
C TC C GC C C T C C G 

641 CTTGGGTAAACTTTAACCGTTATCGCAGAGAGATGACATT 680 
G C A A CA G C 
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681 AACAGTATTAGATTTAATTGCACTATTTCCATTGTATGAT 720 
G T GC C C T C C C C 

721 GTTCGGCTATACCCAAAAGAAGTTAAAACCGAATTAACAA 760 
GA A C G G T GC T C 

761 GAGACGTTTTAACAGATCCAATTGTCGGAGTCAACAACCT . 800 
GC C T C T 

801 TAGGGG CTATGG AACAAC CTTCT CT AAT ATAG AAAATTAT 840 
T T AGC C C C 

841 ATTCGAAAAC C ACATCT ATTTGAC TAT CTGC ATAG AATTC 880 
AG C C T C 

881 AATTTCACACGCGGTTCCAACCAGGATATTATGGAAATGA 920 
C AA T , C T C 

921 CTCTTTCAATTATTGGTCCGGTAATTATGTTTCAACTAGA 960 

c c c c c 

961 CCAAGCATAGGATCAAATGATATAATCACATCTCCATTCT 1000 
T T C C C 

1001 AT GG AAATAAATCCAGTGAACCTGTACAAAATTTAGAATT . 1040 : 
T C G G GCCT G 

1041 TAATGGAGAAAAAGTCTATAGAGCCGTAGCAAATACAAAT 1080 
C C C G C C C 

1081 CTTGCGGTCTGGCCGTCC GCTGTATATTCAGGTGTT ACAA .1120 
C T G A A T C C C ' 

1121 AAGTGGAATTTAGCCAATATAATGATCAAACAGATGAAGC 1160 
G G T G C G C G 

1161 AAGTACACAAACGTACGACTCAAAAAGAAATGTTGGCGCG 1200 
C C C G T C C T C A 

1201 GTCAGCTGGGATTCTATCGATCAATTGCCTCCAGAAACAA 1240 
TCT C C 

1241 C AGATG AAC CTCTAG AAAAGGG AT AT AGCC ATCAACTC AA 1280 
C AT G G C C C . T 

1281 TTATGTAATGTGCTTTTTAATGCAGGGTAGTAGAGGAACA 1320 
C G C G A TCC G C 

1321 ATCCCAGTGTTAACTTGGACACATAAAAGTGTAGACTTTT 1360 
T G C C GTCC G C . 

1361 TTAACATGATTGATTCGAAAAAAATTACACAACTTCCGTT 1400 
C C AGC G G C T C 
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1401 AGTAAAGGCATATAAGTTACAATCTGGTGCTTCCGTTGTC 1440. 

G. g a c c c g 

1441 GCAGGTCCTAGGTTTACAGGAGGAGATATCATTCAATGCA 1480 
C ACT T C C G 

1481 CAGAAAATGGAAGTGCGGCAACTATTTACGTTACACCGGA 1520 
G C C C A T C G T 

1521 TGTGTCGTACTCTCAAAAATATCGAGCTAGAATTCATTAT 1560 
T G G CA G AC T C 

1561 GCTTCTACATCTC AGATAACATTTACACTCAGTTTAGACG 1600 
A CAGC C C C C G T 

1601 GGGCACCATTTAATCAATACTATTTCGATAAAACGATAAA 1640 
A CCCGTC T C GCC 

1641 TAAAGGAGACACATTAACGTATAATTCATTTAATTTAGCA 1680 
C T TC C A C AGC C C G 

1681 AGTTTCAGCACACCATTCGAATTATCAGGGAATAACTTAC 1720 
T C C C C TC T 

1721 AAATAGGCGTCACAGGATTAAGTGCTGGAGATAAAGTTTA 17 60 
G C C TC C C C C C 

1761 TATAGACAAAATTGAATTTATTCCAGTGAAT 1791 
C C G G C C C 
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1 ATG AATAATGTATTGAATAGTGGAAGAACAACTATTT '40 
GAC C C C CTC T C C 

41 GTGATGCGTATAATGTAGTAGCCCATGATCCATTTAGTTT 80 
C C A C C C G T C C C 

81 T GAACAT AAATC ATTAGATACC ATC CAAAAAGAATGGATG 120 
C C GAGCC C C T T G G G 

121 GAGTGGAAAAGAACAGATCATAGTTTATATGTAGCTCCTG 160 
A C T T C CTC C C C C A : 

161 T AGT C GG AACTGTGT CTAGTTTTTT GCT AAAG AAAGT GGG 200 
G T A C C CCT C G C 

201 GAGTC TT ATTGGAAAAAGG AT ATTGAGTGAATTATGG GGG ■ 240 
CTC C C C T C TCC C C T 

241 ATAATATTTCCTAGTGGTAGTACAAATCTAATGCAAGATA 280 
. C C ATC GTCC T C .. C 

281 TTTTAAGGGAG AC AG AACAATTCCT AAATC AAAG ACTTAA. 320 
C G C G T C C GCT C 

321 TACAGATACCCTTGCTCGTGTAAATGCAGAATTGATAGGG 360 
C T T G A A C C T G C T 

361 CTCC AAG C GAAT AT AAGGG AGTTT AATC AAC AAGTAG ATA 400 
A C TC T C C G G C 

4 01 ATTTTTTAAACCCTACTCAAAACCCTGTTCCTTTATCAAT 440 . 

C C G T A G T G CTC 

441 AACTTCTTCGGTTAATACAATGCAGCAATTATTTCTAAAT 480 

C C G C T CC. c c c ; 

481 AGATTACCCC AGTTCC AG AT AC AAGGATACC AGTTGTTAT 520 
G T T T C • C CC 

521 TATTACCTTTATTTGCACAGGCAGCCAATATGCATCTTTC 560 
TC T AC C T T C CT G 

5 61 TTTTATTAGAGATGTTATTCTTAATGCAGATGAATGGGGT 600 

C C AC T C G C C C T C A 

601 ATTTCAGCAGCAACATTACGTACGTATCGAGATTACCTGA 640 
C T C TC TA G A CA C T 

641 GAAATTATACAAGAGATTATTCTAATTATTGTATAAATAC 680 
G C C TC T C C C C C C 
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68 X GTATCAAACTGCGTTTAGAGGGTTAAACACCCGTTTACAC 720 
T G . C C T AC C T TA GC T 

721 GATATGTTAGAATTTAGAACATATATGTTTTTAAATGTAT 760 . 
C C T G C G C C CC T C 6 

7 61 TTGAATATGTATCCATTTGGTCATTGTTTAAATATCAGAG 800 
G C CAG AGTC C C G C 

801 TCTT ATG GTATC TTCTGGC GC T AATTTATATGCT AGCGGT 840 
. CT G G C A C C C C CTCT C 

841 AGTGGACCACAGCAGACACAATCATTTACAGCACAAAACT * 880 
A T GAGC C T G 

881 G GCC ATTTTT AT ATTC TCT TTTC C AAGTT AATTCGAATT A 920 
C G AGCT G C C* C C. 

921 TATATTATCTGGTATTAGTGGTACTAGGCTTTCTATTACC 960 
C TC CAG CTC G C A C C A 

961 TTCCCTAATATTGGTGGTTTACCGGGTAGTACTACAACTC 1000 
T C C AC T A CTCC C 

1001 ATTCATTGAATAGTGCCAGGGTTAATTATAGCGGAGGAGT L040 
AGCC T CTC A GCC T T 

1041 TTCATCTGGTCTCATAGGGGCGACTAATCTCAATCACAAC 1080 
CAGC AT G T T A CT G C 

1081 TTTAATTGCAGCACGGTCCTCCCTCCTTTATCAACACCAT 1120 
C TC C T G A C GAGC G 

1121 TTGTTAGAAGTTGGCTGGATTCAGGTACAGATCGAGAGGG 1160 
G GTCC T CAGC T C A 

1161 CGTTGCTACCTCTACGAATTGGCAGACAGAATCCTTTCAA 1200 
A A C A C G C 

1201 ACAACTTTAAGTTTAAGGTGTGGTGCTTTTTCAGCCCGTG 1240 
C C T CC TC A C T A 

1241 GAAATTC AAACTATTTCCC AGATTATTTTATCCGTAATAT 1280 
G CT CCCTAGC 

1281 TTCTGGG GTT C C TT TAGTT ATTAG AAAC G AAG ATCT AAC A 1320 
C T C C C C G T C C C 

1321 AG ACCGTT AC AC T AT AAC C AAAT AAG AAAT ATAG AAAGTC 1360 
C T AC T T C GTGCC GTC 

1361 CTTCGGGAACACCTGGTGGAGCACGGGCCTATTTGGTATC 1400 
ACTTAAT AATCCCG 
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1401 TGTGCATAACAGAAAAAATAATATCTATGCCGCTAATGAA 1440 . 
C G G C C C T C C G 

1441 AATGGTACTATGATCCATTTGGCGCCAGAAGATTATACAG 1480-'. 
C C T CC T A . C T . 

1481 GATTTACTATATCGCCAATACATGCCACTCAAGTGAATAA 1520 
C C C T C T C ; . C ;. 

1521 T C AAAC T CGAAC ATTTATTTCTGAAAAATTTGG AAATC AA 1560 
G A C C C C C G C ; ♦ 

1561 GGTG ATT CCTT AAG ATTTG AAC AAAGCAAC ACGACAGCTC 1600 
C G G C G TC T C A 

1601 GTTATACGCTTAGAGGGAATGGAAATAGTTACAATCTTTA 1640 
G C TT G C C C C 

1641 TTTAAGAGTATCTTCAATAGGAAATTCAACTATTCGAGTT 1680 
C G TAGC C T T C C C C T 

1681 ACTATAAACGGTAGAGTTTATACTGTTTCAAATGTTAATA 1720 . 
C C AC T C A C T G C 

1721 CCACTACAAATAACGATGGAGTTAATGATAATGGAGCTCG 1760 
T A G C T C C C C CA 

1761 TTTTTCAGATATTAATATCGGTAATATAGTAGCAAGTGAT 1800 
A CAGC C C C T C C C G CTC C 

1801 AATACTAATGTAACGCTAGATATAAATGTGACATTAAACT 1840 
C C T TT G C C CC C T 

1841 CCGGTACTCCATTTGATCTCATGAATATTATGTTTGTGCC 18'80 
T A C C 

1881 AACTAATCTTCCACCACTTTAT 1902 
C C T T G C 
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. 1 ATGGAGGAAAATAATCAAAATCAATGC^TACCTTACAATT 40 
G C C C T A C 

41 GTTTAAGTAATCCTGAAGAAGTACTTTTGGATGGAGAACG 80 
C G C A G . T GC T 

81 GATATCAACTGGTAATTCATCAATTGATATTTCTCTGTCA 120 
C T C C T C C C C CT C 

121 : CTTGTTCAGTTTCTGGTATCTAACTTTGTACCAGGGGGAG 160 
T G C CAGC C G T T 

161 GATTTTTAGTTGGATTAATAGATTTTGTATGGGGAATAGT 200-', 
G CC T C C T C C C T C 

201 TGGCCCTTCTCAATGGGATGCATTTCTAGTACAAATTGAA 240 
T A ' C G G G 

241 CAATTAATTAATGAAAGAATAGCTGAATTTGCTAGGAATG 280 
G G C C G G C G C C C 

281 CTGCTATTGCTAATTTAGAAGGATTAGGAAACAATTTCAA 320 
C C C G G C T C 

321 TATATATGTGGAAGCATTTAAAGAATGGGAAGAAGATCCT 360 
■ C C G C C G G C 

.361 AATAATCCAGAAACCAGGACCAGAGTAATTGATCGC1TTC 400 
C G CC T G G C CAA CA 

401 GTATACTTGATGGGCTACTTGAAAGGGACATTCCTTCGTT 440 
A CT G C C CT G G A T C A C 

441 TCGAATTTCTGGATTTGAAGTACCCCTTTTATCCGTTTAT .480 
CA C C C T T C G G C 

481 GCTCAAGCGGCCAATCTGCATCTAGCTATATTAAGAGATT 520 
AT T C C CC TC CA 

521 CTGTAATTTTTGGAGAAAGATGGGGATTGACAACGATAAA 560 
G C C G G C T C 

561 TGTCAATGAAAACTATAATAGACTAATTAGGCATATTGAT 600 
C G T C C T C C C 

601 GAATATGCTGATCACTGTGCAAATACGTATAATCGGGGAT 640 
G C C C T C C C C T C 

641 TAAATAATTTACCGAAATCTACGTATCAAGATTGGATAAC 680 
G C CC T G T T 

681 ATAT AATCG ATT ACG G AG AG ACTT AAC ATT G ACTGT ATTA 720 
C C CA G GA G CC C A T G 
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721. GATATCGCCGCTTTCTTTCCAAACTATGACAATAGGAGAT 760 
C T A C G C 

761 ATCCAATTCAGCCAGTTGGTCAACTAACAAGGGAAGTTTA .800 
C T C A G T C A C . 

801 TACGGACCCATTAATTAATTTTAATCCACAGTTACAGTCT ; 840 / 
T C T C C C T G AAG 

8 41 GTAGCTCAATTACCTACTTTTAACGTTATGGAGAGCAGCC 880 
C C CTC A C C TC 

881 GAATTAGAAATCCTCATTtATTTGATATATTGAATAATCT 920 
T C G C A C G C C C C 

921 TACAATCTTTACGGATTGGTTTAGTGTTGGACGCAATTTT 960 
T C C C C G T C C 

961 TATTGGGGAGGACATCGAGTAATATCTAGCCTTATAGGAG 1000 
T CA G C C C1CT T 

1001 GTGGTAACATAACATCTCCTATAtATGGAAGAGAGGCGAA 1040 
G T C C C T 'A 

1041 CCAGGAGCCTCCAAGATCCTTTACTTTTAATGGACCGGTA 1080 
A C TAGT C C C C T A C 

1081 TTTAGGACTTTATCAAATCCTACTTTACGATTATTACAGC . 1120 
CACGTC . C G A GC C . 

1121 AACCTTGGCCAGCGCCACCATTTAATTTACGTGGTGTTGA 1160. 
T . T C CC TA A 

1161 AGGAGTAGAATTTTCTACACCTACAAATAGCTTTACGTAT 1200 
GCT GC T C CTC C T C 

1201 CGAGGAAGAGGTACGGTTGATTCTTTAACTGAATTACCGC 1240 
A T AC C G C C C A 

1241 CTGAGGATAATAGTGTGCCACCTCGCGAAGGATATAGTCA 1280 
A C C CA G C CTCC 

1281 TCGTTTATGTCATGCAACTTTTGTTCAAAGATCTGGAACA 1320 
CA G G C C C C G GC T C T 

1321 CCTTTTTTAACAACTGGTGTAGTATTTTCTTGGACCGATC 1360 
A CC C T A A T G C A T 

1361 GTAGTGCAACTCTTACAAATACAATTGATCCAGAGAGAAT 1400 
T CTC C G 
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1401 TAATCAAATACCTTTAGTGAAAGGATTTAGAGTTTGGGGG 1440 
C CAGCG T CCTG A 

1441 GGCACCTCTGTCATTACAGGACCAGGATTTACAGGAGGGG 1480 
AT C C C T 

.1481 ATATCCTTCGAAGAAATACCTTTGGTGATTTTGTATCTCT 1520 
T A C T C C GAGC 

1521 ACAAGTCAATATTAATTCACCAATTACCCAAAGATACCGT 1560 
C T C C C T T T 

1561 TTAAGATTTCGTTACGCTTCCAGTAGGGATGCACGAGTTA 1600 
C C G A TTCCC T C TA C 

1601 TAGTATTAACAGGAGCGGCATCCACAGGAGTGGGAGGCCA 1640 
C 6CC C C ATT C T C T A 

1641 AGTTAGTGTAAATATGCCTCTTCAGAAAACTATGGAAATA 1680 
. CTCC G C A C G 6 C 

168 1 GGGG AGAACTTAACATCTAGAAC ATTTAGATATACCGATT 1720 
C G C 6 C C C C 

1721 TTAGTAATCCTTTTTCATTTAGAGCTAATCCAGATATAAT 1760 
CTC C CAGT CC T C C T C C 

1761 TGGGATAAGTGAACAACCTCTATTTGGTGCAGGTTCTATT 1800 
CTC C A T AGC C 

1801 AGTAGCGGTGAACTTTATATAGATAAAATTGAAATTATTC 1840 
TCATCT C T G C T C G G C 

1841 TAGCAGATGCAACATTTGAAGCAGAATCTGATTTAGAAAG 1880 
T C C T CC C G T G ACA CC T G 

1881 AGCACAAAAGGCGGTGAATGCCCTGTTTACTTCTTCCAAT 1920 

C G T C C C CA 

■ ■ 

1921 CAAATCGGGTTAAAAACCGATGTGACGGATTATCATATTG 1960 
GC T C G TA C T T C C 

1961 ATCAAGTATCCAATTTAGTGGATTGTTTATCAGATGAATT 2000 
C G C G CACC ACC TAGC G 

2001 TTGTCTGGATGAAAAGCGAGAATTGTCCGAGAAAGTCAAA 2040 
C C C C G T C C T 

2041 CATGCGAAGCGACTCAGTGATGAGCGGAATTTACTTCAAG 2080 
C C T CCA C CT G 

2081 ATCCAAACTTCAGAGGGATCAATAGACAACCAGACCGTGG 2120 
CT C A AC C G G A 
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2121 CTGGAGAGGAAGTACAGATATTACCATCCAAGGAGGAGAT . 2160 
T G T C C 66 C C C 

2161 GACGTATTCAAAGAGAATTACGTCACACTACCGGGTACCG 2200 
T G G C C CT C A TT 

2201 TTGATGAGTGCTATCCAACGTATTTATATCAGAAAATAGA 2240 
C C C T C C G C G C 

2241 TGAGTCGAAATTAAAAGCTTATACCCGTTATGAATTAAGA 2280 
C C C C'TC AG C C T 

2281 GGGTATATCGAAGATAGTCAAGACTTAGAAATCTATTTGA ,2320 
C C C C C T C C 

2321 TCCGTTACAATGCAAAACACGAAATAGTAAATGTGCCAGG 2360 
AG C G 6 CC G C 

2361 CACGGGTTCCTTATGGCCGCTTTCAGCCCAAATGCCAATC 2400 
T T CC A T TCT C T 

2401 GGAAAG7GTGGAGAACCGAATCGATGCGCGCCACACCTTG 2440 
G G T CA T 

2441 AAXGGAATCCTGATCTAGATTGTTCCTGCAGAGACGGGGA 2480 
Q CT 6 C C ■ G T C 

2481 AAAATGTGCACATCATTCCCATCATTTCACCTTGGATATT 2520 
G G C C T C T . C C 

2521 GATGTTGGATGTACAGACTTAAATGAGGACTTAGGTGTAT 2560 
G T C G C C A C 

2561 GGGTGATATTCAAGATTAAGACGCAAGATGGCCATGCAAG 2600 
C C C C C A C 

2601 ACTAGGGAATCTAGAGTTTCTCGAAGAGAAACCATTATTA 2 6 40 
T C C T GG C 

2 641 GGGGAAGCACTAGCTCGTGTGAAAAGAGCGGAGAAGAAGT 2680 
T T C 6 A 

2681 GGAGAGACAAACGAGAGAAACTGCAGTTGGAAACAAATAT 2720 
6 T CG AG T C 

2721 TGTTTATAAAGAGGCAAAAGAATCTGTAGATGCTTTATTT 2760 
C C G C G CG G C 

2761 GTAAACTCTCAATATGATAGATTACAAGTGGATACGAACA 2800 
G C CAG G CC C C 

2801 TCGCCATGATTCATGCGGCAGATAAACGCGTTCATAGAAT 2840 
CCC C TGCC 
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2841 CCGGGAAGCGTATCTGCCAGAGTTGTCTGTGATTCCAGGT 2880 
T T. G T CT T C C T 

2881 GTCAATGCGGCCATTTTCGAAGAATTAGAGGGACGTATTT 2920 
G C T C G C T C 

2921 TTACAGCGTATTCCTTATATGATGCGAGAAATGTCATTAA 2960 
C A TC G C C C C 

2961 AAATGGCGATTTCAATAATGGCTTATTATGCTGGAACGTG 3000 
G C T C C C CAGC T 

3X501 AAAGGTCATGTAGATGTAGAAGAGCAAAACAACCACCGTT 3040 
G . C 6 . G A G T G 

3041 CGGTCCTTGTTATCCCAGAATGGGAGGCAGAAGTGTCACA 3080 
C G G G T G A T C 

3081 AGAGGTTCGTGTCTGTCCAGGTCGTGGCTATATCCTTCGT 3120 
X A A A C T C 

3121 GTC AC AGCAT AT AAAG AGGG AT AT GG AG AGGGCTGCGT AA 3160 
G C T C G C T T G 

3161 CGATCCATGAGATCGAAGACAATACAGACGAACTGAAATT 3200 
C C GA C C G T G 

. 3201 CAGCAACTGTGTAGAAGAGGAAGTATATCCAAACAACACA 3240 
TC C C . G A A C C C 

3241 GTAACGTGTAATAATTATACTGGGACTCAAGAAGAATATG 3280 
T T C CGC C T A G G C 

3281 AGGGTACGTACACTTCTCGTAATCAAGGATATGACGAAGC 3320 
GA G C AGC CAG T CA 

3321 CTATGGTAATAACCCTTCCGTACCAGCTGATTACGCTTCA 3360 
TCC TCXXXXXXXXXXXX T T C T C C 

3361 GTCTATGAAGAAAAATCGTATACAGATGGACGAAGAGAGA 3400 
G C G G C C CA C T 

3401 ATCCTTGTGAATCTAACAGAGGCTATGGGGATTACACACC 3440 
C CGTC TCA C 

3441 ACTACCGGCTGGTTATGTAACAAAGGATTT AGAGTACTTC 3480 
T A T C T C GC 7 T 

3481 CCAGAGACCGATAAGGTATGGATTGAGATCGGAGAAACAG 3520 
T CAG C T C 

. 3521 AAGGAACATTCATCGTGGATAGCGTGGAATTACTCCTTAT 3560 
G C C GC T T G 
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1 AGATCTAGAGGTAATTGTTATGAGTACTGTCGTGGTTAAG 40 
GATC 

41 . GGAAACGTCAACGGTGGTGTACAACAACCTAGAAGGAGGA 80 
G T A ■ 

81 GAAGGCAATCCCTTCGCAGGAGGGCTAACAGAGTACAGCC 120 

T ■■" A .. T 

121 AGTGGTTATGGTCACTGCTCCTGGCGAACdCAGGAGGAGG 160' 

GC A A A 

161- AGACGCAGAAGAGGAGGCAATCGCAGGTCAAGAAGAACTG 200 

... • A G T A 

201 GAGTTCCCAGGGGAAGGGGCTCAAGCGAGACATTCGTGTT 240 
A A T 

241 TACAAAGGACAACCTCGTGGGCAACTCCCAAGGAAGTTTC 280 

281 ACCTTCGGACCAAGTGTATCAGACTGTCCAGCATTCAAGG 320 
T 

321 ATGGAATACTCAAGGCCTACCATGAGTACAAGATCACAAG 360 

T ;• 

361 TATCCTTCTTCAGTTCGTCAGCGAGGCCTCTTCCACCTCA 400 
T G T 

401 CC AGG ATCCATCGCTTATG AGTTGG ACCC AC ATTGC AAAG 440 
C A T 

441 TATCATCCCTCC AGTCCTACGTCAACAAGTTCCAAATCAC 480 
T 

481 AAAGGGAGGAGCTAAGACCTATCAAGCTAGGATGATCAAC 520 
T T C T 

521 GGAGTAGAATGGCACGATTCATCTGAGGATCAGTGCAGGA 560 
T T A 

561 TACTTTG G AAAG GAAGTGGAAAATCTTCAGACCCAGC AGG 600 
. C A G T T 

601 ATCTTTCAGAGTCACCATCAGAGTGGCTCTTCAAAACCCC 640 
T T A 

641 AAGTAATAGACTCCGGATCAGAGCCTGGTCCAAGCCCACA 680 
AT 
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681 ACCAACACCCACTCCAACTCCCCAAAAGCATGAGCGATTT 720 
721 ATTGCTTACGTCGGCATACCTATGCTGACCATTCAAGAAT 760 
761 TC 762 

. ; FIGURE 16B 
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